Concrete Compressive Strength

Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate.

The actual concrete compressive strength (MPa) for a given mixture under a specific age (days) was determined from laboratory. Data is in raw form (not scaled).

Variable Information:

Given is the variable name, variable type, the measurement unit and a brief description. The concrete compressive strength is the regression problem. The order of this listing corresponds to the order of numerals along the rows of the database.

Name -- Data Type -- Measurement -- Description

  • Cement (component 1) -- quantitative -- kg in a m3 mixture -- Input Variable
  • Blast Furnace Slag (component 2) -- quantitative -- kg in a m3 mixture -- Input Variable
  • Fly Ash (component 3) -- quantitative -- kg in a m3 mixture -- Input Variable
  • Water (component 4) -- quantitative -- kg in a m3 mixture -- Input Variable
  • Superplasticizer (component 5) -- quantitative -- kg in a m3 mixture -- Input Variable
  • Coarse Aggregate (component 6) -- quantitative -- kg in a m3 mixture -- Input Variable
  • Fine Aggregate (component 7) -- quantitative -- kg in a m3 mixture -- Input Variable
  • Age -- quantitative -- Day (1~365) -- Input Variable (Concrete hardens with time and strength increases. Usually, concrete is tested after 28 days.)
  • Concrete compressive strength -- quantitative -- MPa -- Output Variable

Let's frame the problem:

- It is clearly a typical supervised learning task since you are given labeled training examples (each instance comes with the expected output, i.e., the Concrete compressive strength ). 
- It is also a typical regression task, since you are asked to predict a value. More specifically, this is a multivariate regression problem since the system will use multiple features to make a prediction

Select a Performance Measure

Next step is to select a performance measure. A typical performance measure for regression problems is the Root Mean Square Error (RMSE). It gives an idea of how much error the system typically makes in its predictions, with a higher weight for large errors.

Import necessary modules

In [1]:
#Import all the necessary modules
import pandas as pd
import numpy as np
import os
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams["figure.figsize"] = (10,8)


import warnings
warnings.filterwarnings('ignore')

#!jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10

from scipy.stats import zscore
from sklearn.decomposition import PCA


from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_nodeinteractivity = 'all'


from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor, AdaBoostRegressor
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
import xgboost 

Load the Data

In [2]:
df = pd.read_csv("concrete_1.csv")

Take a Quick Look at the Data Structure

In [3]:
df.head()
Out[3]:
cement slag ash water superplastic coarseagg fineagg age strength
0 141.3 212.0 0.0 203.5 0.0 971.8 748.5 28 29.89
1 168.9 42.2 124.3 158.3 10.8 1080.8 796.2 14 23.51
2 250.0 0.0 95.7 187.4 5.5 956.9 861.2 28 29.22
3 266.0 114.0 0.0 228.0 0.0 932.0 670.0 28 45.85
4 154.8 183.4 0.0 193.3 9.1 1047.4 696.7 28 18.29
In [4]:
#The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
cement          1030 non-null float64
slag            1030 non-null float64
ash             1030 non-null float64
water           1030 non-null float64
superplastic    1030 non-null float64
coarseagg       1030 non-null float64
fineagg         1030 non-null float64
age             1030 non-null int64
strength        1030 non-null float64
dtypes: float64(8), int64(1)
memory usage: 72.5 KB
In [5]:
#The describe() method shows a summary of the numerical attributes
df.describe().T
Out[5]:
count mean std min 25% 50% 75% max
cement 1030.0 281.167864 104.506364 102.00 192.375 272.900 350.000 540.0
slag 1030.0 73.895825 86.279342 0.00 0.000 22.000 142.950 359.4
ash 1030.0 54.188350 63.997004 0.00 0.000 0.000 118.300 200.1
water 1030.0 181.567282 21.354219 121.80 164.900 185.000 192.000 247.0
superplastic 1030.0 6.204660 5.973841 0.00 0.000 6.400 10.200 32.2
coarseagg 1030.0 972.918932 77.753954 801.00 932.000 968.000 1029.400 1145.0
fineagg 1030.0 773.580485 80.175980 594.00 730.950 779.500 824.000 992.6
age 1030.0 45.662136 63.169912 1.00 7.000 28.000 56.000 365.0
strength 1030.0 35.817961 16.705742 2.33 23.710 34.445 46.135 82.6
In [6]:
def basic_details(df):
    b = pd.DataFrame()
    b['Missing value'] = df.isnull().sum()
    b['N unique value'] = df.nunique()
    b['dtype'] = df.dtypes
    return b
basic_details(df)
Out[6]:
Missing value N unique value dtype
cement 0 278 float64
slag 0 185 float64
ash 0 156 float64
water 0 195 float64
superplastic 0 111 float64
coarseagg 0 284 float64
fineagg 0 302 float64
age 0 14 int64
strength 0 845 float64

Observations

  • There are 9 columns and 1030 rows
  • There are no missing values in this dataset. However, slag, ash and superplastic columns have minimum values as zero's, need to check if they indicate missing values
  • All the columns are numeric columns, there are no categorical columns in this dataset
  • The std row shows the standard deviation, which measures how dispersed the values are
  • The 25%, 50%, and 75% rows show the corresponding percentiles: a percentile indicates the value below which a given percentage of observations in a group of observations falls. For example, 25% of the observation have water component lower than 164, while 50% have water compoenent lower than 185 and 75% have water compoenent lower than 192. These are often called the 25th percentile (or 1st quartile), the median, and the 75th percentile (or 3rd quartile).
In [7]:
target = 'strength'
X = df.loc[:, df.columns!=target]
y = df.loc[:, df.columns==target]

Split the data into train and test

While spliting the data, we may want to ensure that the test set is representative of the various categories of strength in the whole dataset. Since the strength is a continuous numerical attribute, we will have to trick Python into interpreting your continuous numerical y variable as a categorical variable instead. By creating bins, and passing y variable into an ndarray containing those bins and the corresponding values.

In [8]:
# Create the bins.  My `y` variable has 1030 observations, and I want 50 bins.

bins = np.linspace(0, df.shape[0], 50)

# Save your Y values in a new ndarray,
# broken down by the bins created above.

y_binned = np.digitize(y, bins)

# Pass y_binned to the stratify argument,
# and sklearn will handle the rest

#Test train split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y_binned,random_state=2)

Feature Selection Using Random Forest

Discover and Visualize the Data to Gain Insights

We have taken a quick glance at the data to get a general understanding of the kind of data. Now the goal is to go a little bit more in depth. First, we will make sure we have put the test set aside and we are only exploring the training set. Also, if the training set is very large, we may want to sample an exploration set, to make manipulations easy and fast. In our case, the set is quite small so we can just work directly on the full set.

In [9]:
df_temp = pd.concat([X_train,y_train],axis=1)

Lets look at the distribution plot (Univariate Analysis)

In [10]:
df_temp.hist(bins=30, figsize=(20,15))
plt.show()
In [11]:
import itertools
cols = [i for i in df_temp.columns if i not in 'strength']
length = len(cols)
cs = ["b","r","g","c","m","k","lime","c"]
fig = plt.figure(figsize=(13,25))

for i,j,k in itertools.zip_longest(cols,range(length),cs):
    plt.subplot(5,2,j+1)
    ax = sns.distplot(df_temp[i],color=k,rug=True)
    ax.set_facecolor("w")
    quartile_1,quartile_3 = np.percentile(df[i],[25,75])
    plt.axvline(df_temp[i].median(),linestyle="dashed",label="median",color="k")
    plt.axvline(df_temp[i].mean(),linestyle="dashed",label="mean",color="b")
    plt.axvline(np.percentile(df_temp[i],25),linestyle="dashed",label="q1",color="r")
    plt.axvline(np.percentile(df_temp[i],75),linestyle="dashed",label="q3",color="g")
    plt.legend(loc="best")
    plt.title(i,color="navy")
    plt.xlabel("")
In [12]:
ax = sns.distplot(df_temp["strength"],color=k,rug=True)
ax.set_facecolor("w")
plt.axvline(df_temp["strength"].median(),linestyle="dashed",label="median",color="k")
plt.axvline(df_temp["strength"].mean(),linestyle="dashed",label="mean",color="b")
plt.axvline(np.percentile(df_temp["strength"],25),linestyle="dashed",label="q1",color="r")
plt.axvline(np.percentile(df_temp["strength"],75),linestyle="dashed",label="q3",color="g")
plt.legend(loc="best")
plt.title(i,color="navy")
plt.xlabel("")
Out[12]:
Text(0.5,0,'')

Observations

  • Distribution of strength,cement variable is close to normal.
  • However, we can see some long tails toward right for slag, water ,age and superplastic components.

Plot strength against age

In [13]:
sns.lineplot("age","strength",data=df)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x1600842db00>

We see that with age the strength of cement increases upto a certain level, and then after 100 days there is a sharp decrease in the strength and also after 250 days there seems to be decrease in the strength, it could be due to more moisture absorption from the atmosphere.

In [14]:
sns.set_style('darkgrid')
g = sns.FacetGrid(df_temp,hue="age",palette='coolwarm',size=6,aspect=2)
g = g.map(plt.hist,'strength',bins=20,alpha=0.7)
In [15]:
df_temp["age"].value_counts().sort_values()
Out[15]:
120      2
1        2
360      6
270     10
365     11
180     15
91      19
100     35
14      40
90      42
56      70
3       80
7       89
28     300
Name: age, dtype: int64

Box Plots to visualize the outliers

In [16]:
sns.set(context="paper", font="monospace")
# Create a figure instance
fig = plt.figure(1, figsize=(18, 12))

# Create an axes instance
ax = fig.add_subplot(111)

g = sns.boxplot(data=df_temp, ax=ax, color="blue")
g.set_xticklabels(df_temp.columns,rotation=90)

# Add transparency to colors
for patch in g.artists:
    r, g, b, a = patch.get_facecolor()
    patch.set_facecolor((r, g, b, .3))  

Observations

  • There are outliers within slag, age, water component. The outliers present in numeric feature can be dealt by following ways based on the domain knowledge
    • delete the outliers (there could be loss of data due to this)
    • impute ouliers with mean/median
    • impute with lower and upper bound values.

For now, we would impute the ouliers with 1% and 95% values

In [17]:
def outlier(df,columns):
    for i in columns:
        quartile_1,quartile_3 = np.percentile(df[i],[25,75])
        quartile_f,quartile_l = np.percentile(df[i],[1,95])
        IQR = quartile_3-quartile_1
        lower_bound = quartile_1 - (1.5*IQR)
        upper_bound = quartile_3 + (1.5*IQR)
        print(i,lower_bound,upper_bound,quartile_f,quartile_l)
                
        df[i].loc[df[i] < lower_bound] = quartile_f
        df[i].loc[df[i] > upper_bound] = quartile_l    
        
outlier(df_temp, ['age','fineagg','water'])
age -49.0 119.0 3.0 180.0
fineagg 574.7499999999999 973.5500000000001 594.0 891.9
water 125.99999999999999 231.60000000000002 129.4 228.0

Bivariate Analysis using Correlation matrix and Pair plots

Correlation Matrix

In [18]:
def correlation_matrix(df):
    corrmat = df.corr()
    top_corr_features = corrmat.index
    plt.figure(figsize=(10,8))
    #plot heat map
    g=sns.heatmap(df[top_corr_features].corr(),annot=True,cmap="RdYlGn")


correlation_matrix(df_temp)
In [19]:
# Lets check for highly correlated variables
cor= df_temp.corr()
cor.loc[:,:] = np.tril(cor,k=-1)
cor=cor.stack()
cor[(cor > 0.55) | (cor< -0.55)]
Out[19]:
superplastic  water   -0.643203
dtype: float64

Pair Plot

In [20]:
sns.pairplot(df_temp,diag_kind='kde')
Out[20]:
<seaborn.axisgrid.PairGrid at 0x16007f95320>

Observations

  • We can see density plots along the diagonal, there are multiple gaussians in every parameter
    • cement parameter seems to have around 4 gaussians (3 are clearly visible and 1 is slightly visible)
    • slag parameter seems to have 2 gaussians
    • ash parameter seems to have 2 gaussians, first gaussian contains mostly zero's
    • similarly, we can see 4 gaussians for water, superplastic, coarseagg, fineagg
    • age seems to have around 5 gaussians
  • Cement has a good correlation with strength, the more cement stronger the concrete.
  • With age, we can see that the strength of cement decreases, it could be due to more moisture absorption from the atmosphere.
  • Also, we can see that higher the water, the strength of concrete decreases, its good to have lesser water
  • superplasticizers has negative correlation with water, superplasticizers always help decrease water - hence an increase in strength
  • Following Features are listed based on the order of importance (ascending order) to predict strength
    • cement
    • age
    • superplastic
    • water
    • fineagg
    • coarseagg
    • slag
    • ash

We will analyse these features further, to identify key features to predict stength

Blast Furnace Slag

  • Blast furnace slag powder as partial replacement of cement, from pair panel we can see that there is a negative correlation between cement and slag components i.e as the proportion of slag component increases the proportion of cement is low.
  • Also, having zero values in slag column seems to be acceptable as its a partial replacement of cement and in cases where the values are zero we can consider that slag component was not being used.

Fly Ash

  • Fly Ash, is also used in replacement of cement, from pair panel agaian we can see that there is negative correlation between cement and Fly Ash component
  • Even here having zero values in Fly Ash, column seems to be acceptable as its a partial replacement of cement and in cases where the values are zero we can consider that Fly Ash component was not being used.
  • Fly Ash has negative correlation with water
In [21]:
## check Blast Furnace Slag, Ash and Cement proportion

df_temp[(df_temp['slag']!=0) & (df_temp['ash']!=0)  ].head()
Out[21]:
cement slag ash water superplastic coarseagg fineagg age strength
864 213.7 98.1 24.5 181.7 6.9 1065.8 785.4 100.0 53.90
356 136.0 162.0 126.0 172.0 10.0 923.0 764.0 28.0 29.07
909 314.0 145.0 113.0 179.0 8.0 869.0 690.0 28.0 46.23
551 260.0 101.0 78.0 171.0 10.0 936.0 763.0 28.0 49.77
446 165.0 128.5 132.1 175.1 8.1 1005.8 746.6 100.0 55.02

Superplasticizer

  • Plasticizers are also often used when Fly ash is added to concrete to improve strength.
  • Superplasticizers always help decrease batching water thus porosity - hence an increase in strength
  • Having zero values in Superplasticizers also column seems to be acceptable as its a admixture used to improve the strength of concrete and not a required component.
In [22]:
df_temp[(df_temp['ash']!=0)  &  (df_temp['superplastic']==0)]
Out[22]:
cement slag ash water superplastic coarseagg fineagg age strength
847 165.0 0.0 143.6 163.8 0.0 1005.6 900.9 3.0 14.40
1000 165.0 0.0 143.6 163.8 0.0 1005.6 900.9 100.0 37.96
586 165.0 0.0 143.6 163.8 0.0 1005.6 900.9 56.0 36.56
242 165.0 0.0 143.6 163.8 0.0 1005.6 900.9 28.0 26.20
36 165.0 0.0 143.6 163.8 0.0 1005.6 900.9 14.0 16.88
In [23]:
## Function to train and evaluate models
def train_evaluate_model(X_train,y_train,X_test,y_test):
    classifiers = [
        Ridge(alpha=0.5),
        Lasso(alpha=0.1),
        SVR(),
        GradientBoostingRegressor(random_state=2),
        AdaBoostRegressor(random_state=2),
        RandomForestRegressor(random_state=2),
        LinearRegression(),
        xgboost.XGBRegressor(random_state=2),
        DecisionTreeRegressor(random_state=2)
    ]

    # Logging for Visual Comparison
    log_cols=["Classifier", "RMSE Train","RMSE Test", "R2 Train", "R2 Test"]
    log = pd.DataFrame(columns=log_cols)

    for clf in classifiers:
        clf.fit(X_train, y_train)
        name = clf.__class__.__name__

        y_pred_train = clf.predict(X_train)
        mse_train = np.sqrt(mean_squared_error(y_train, y_pred_train))
        r2_train = r2_score(y_train, y_pred_train)

        y_pred_test = clf.predict(X_test)
        mse_test = np.sqrt(mean_squared_error(y_test, y_pred_test))
        r2_test = r2_score(y_test, y_pred_test)

        log_entry = pd.DataFrame([[name,mse_train,mse_test,r2_train,r2_test]], columns=log_cols)

        log = log.append(log_entry)
    log.set_index(["Classifier"],inplace=True)
    
    return log.sort_values(by=['RMSE Test'])
In [24]:
initial_model_asis = train_evaluate_model(X_train,y_train,X_test,y_test)
initial_model_asis
[19:12:03] WARNING: C:/Jenkins/workspace/xgboost-win64_release_0.90/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Out[24]:
RMSE Train RMSE Test R2 Train R2 Test
Classifier
GradientBoostingRegressor 3.742583 5.195720 0.949697 0.903464
XGBRegressor 3.902962 5.203390 0.945294 0.903178
RandomForestRegressor 2.528433 5.685100 0.977041 0.884422
AdaBoostRegressor 7.062768 7.458793 0.820858 0.801053
DecisionTreeRegressor 1.183336 7.782888 0.994971 0.783389
Lasso 10.253483 10.678095 0.622436 0.592257
Ridge 10.253454 10.679274 0.622438 0.592167
LinearRegression 10.253454 10.679277 0.622438 0.592166
SVR 15.683336 16.557206 0.116666 0.019667

Observation

  • We see that Linear models are not giving a good performance, mainly because from the pair panel we can see that there is no linear relationship between the independent variables and the target (strength).

Lets try creating polynomial Features using degree=2 and check if that boosts the performance

In [25]:
from sklearn.preprocessing import PolynomialFeatures
polynomial_features = PolynomialFeatures(degree=2)
x_poly_train = polynomial_features.fit_transform(X_train)

x_poly_test = polynomial_features.fit_transform(X_test)

model_poly = train_evaluate_model(x_poly_train,y_train,x_poly_test,y_test)
model_poly
[19:12:04] WARNING: C:/Jenkins/workspace/xgboost-win64_release_0.90/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Out[25]:
RMSE Train RMSE Test R2 Train R2 Test
Classifier
XGBRegressor 3.329454 4.687164 0.960190 0.921437
GradientBoostingRegressor 3.236069 4.828773 0.962392 0.916618
RandomForestRegressor 2.562804 5.596279 0.976413 0.888005
AdaBoostRegressor 6.212038 6.925869 0.861415 0.828467
DecisionTreeRegressor 1.183336 7.388302 0.994971 0.804796
LinearRegression 7.138750 7.777580 0.816983 0.783684
Ridge 7.152420 7.793956 0.816281 0.782772
Lasso 7.330082 7.947327 0.807041 0.774139
SVR 15.810526 16.673139 0.102280 0.005891

Observation:

  • With polynomial of degree 2 we see that there is a boost in performance for Linear models
  • SVR is performing very poorly. So we will exclude SVR this model from further steps

Feature Selection Using Random Forest

In [26]:
fs_model = RandomForestRegressor(n_estimators=100,oob_score=True,random_state=42)
fs_model.fit(X_train, y_train)
fs_model.feature_importances_


pd.DataFrame(fs_model.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(['Imp'],ascending=False).plot(kind='barh',figsize=[7,6])
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x1600b0ed898>

Feature Selection Using DecisionTreeRegressor

In [27]:
fs_model_dt = DecisionTreeRegressor(random_state=42)
fs_model_dt.fit(X_train, y_train)
fs_model_dt.feature_importances_

pd.DataFrame(fs_model_dt.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(['Imp'],ascending=False).plot(kind='barh',figsize=[7,6])
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x1600be9d908>

Inference

  • With both RandomForestRegressor and DecisionTreeRegressor, we see that cement, age, superplastic, slag and water are the key features

Feature Engineering

A key factor in concrete strength is the water-cement ratio, so lets create a new feature names "water_cement_ratio"

In [28]:
df_temp["water_cement_ratio"] = df_temp["water"]/df_temp["cement"]

Next, we would create "coarse_fine_agg_ratio" using Coarse and Fine aggregate components

In [29]:
df_temp["coarse_fine_agg_ratio"] = df_temp["coarseagg"]/df_temp["fineagg"]

With a bit of ground work around the domain , I also found that water_binder_ratio is another factor which could help in predicting the cement strength. So, lets create a new feature - 'water_binder_ratio'.

In [30]:
df_temp["water_binder_ratio"] = df_temp["water"]/(df_temp["cement"] + df_temp["ash"] + df_temp["slag"])
In [31]:
df_temp.head()
Out[31]:
cement slag ash water superplastic coarseagg fineagg age strength water_cement_ratio coarse_fine_agg_ratio water_binder_ratio
907 285.0 190.0 0.0 163.0 7.6 1031.0 685.0 28.0 53.58 0.571930 1.505109 0.343158
864 213.7 98.1 24.5 181.7 6.9 1065.8 785.4 100.0 53.90 0.850257 1.357016 0.540291
629 322.2 0.0 115.6 196.0 10.4 817.9 813.4 28.0 31.18 0.608318 1.005532 0.447693
910 331.0 0.0 0.0 192.0 0.0 879.0 825.0 3.0 13.52 0.580060 1.065455 0.580060
356 136.0 162.0 126.0 172.0 10.0 923.0 764.0 28.0 29.07 1.264706 1.208115 0.405660
In [32]:
sns.pairplot(df_temp,diag_kind='kde')
Out[32]:
<seaborn.axisgrid.PairGrid at 0x1600b292b00>
In [33]:
correlation_matrix(df_temp)

We see a lot of correlation with the composite features and the base features using which they are derived. So. lets drop cement, water , coarseagg , fineagg , ash, slag columns as we have derived new features out of these

In [34]:
# #Drop a column
df_temp.drop('cement', axis=1, inplace=True)
df_temp.drop('water', axis=1, inplace=True)
In [35]:
df_temp.drop('coarseagg', axis=1, inplace=True)
df_temp.drop('fineagg', axis=1, inplace=True)
In [36]:
df_temp.drop('ash', axis=1, inplace=True)
df_temp.drop('slag', axis=1, inplace=True)
In [37]:
df_temp.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 721 entries, 907 to 721
Data columns (total 6 columns):
superplastic             721 non-null float64
age                      721 non-null float64
strength                 721 non-null float64
water_cement_ratio       721 non-null float64
coarse_fine_agg_ratio    721 non-null float64
water_binder_ratio       721 non-null float64
dtypes: float64(6)
memory usage: 59.4 KB

Lets get pre- process the test data to create the new features

In [38]:
df_temp_test = X_test.copy()
df_temp_test["water_cement_ratio"] = df_temp_test["water"]/df_temp_test["cement"]
df_temp_test ["coarse_fine_agg_ratio"] = df_temp_test ["coarseagg"]/df_temp_test ["fineagg"]
df_temp_test["water_binder_ratio"] = df_temp_test["water"]/(df_temp_test["cement"] + df_temp_test["ash"] + df_temp_test["slag"])

# #Drop a column
df_temp_test.drop('cement', axis=1, inplace=True)
df_temp_test.drop('water', axis=1, inplace=True)
df_temp_test.drop('coarseagg', axis=1, inplace=True)
df_temp_test.drop('fineagg', axis=1, inplace=True)
df_temp_test.drop('ash', axis=1, inplace=True)
df_temp_test.drop('slag', axis=1, inplace=True)

Build a Model with the composite features created using feature engineering

In [39]:
target = 'strength'
X_train = df_temp.loc[:, df_temp.columns!=target]
y_train = df_temp.loc[:, df_temp.columns==target]

X_test = df_temp_test.loc[:, df_temp_test.columns!=target]

fe_model1 = train_evaluate_model(X_train,y_train,X_test,y_test)

fe_model1
[19:12:26] WARNING: C:/Jenkins/workspace/xgboost-win64_release_0.90/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Out[39]:
RMSE Train RMSE Test R2 Train R2 Test
Classifier
XGBRegressor 4.060722 5.114453 0.940782 0.906460
GradientBoostingRegressor 3.931313 5.132596 0.944496 0.905795
RandomForestRegressor 2.573662 5.341333 0.976212 0.897977
AdaBoostRegressor 6.933924 6.928696 0.827334 0.828327
DecisionTreeRegressor 1.210579 7.070416 0.994737 0.821232
LinearRegression 9.512257 10.916463 0.675051 0.573849
Ridge 9.526853 10.959822 0.674053 0.570457
Lasso 9.592576 11.052483 0.669540 0.563163
SVR 12.467489 12.386998 0.441779 0.451304
In [40]:
fs_model = RandomForestRegressor(n_estimators=100,oob_score=True,random_state=42)
fs_model.fit(X_train, y_train)
fs_model.feature_importances_


pd.DataFrame(fs_model.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(['Imp'],ascending=False).plot(kind='barh',figsize=[7,6])
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x16014e8dbe0>
In [41]:
fs_model_dt = DecisionTreeRegressor(random_state=42)
fs_model_dt.fit(X_train, y_train)
fs_model_dt.feature_importances_

pd.DataFrame(fs_model_dt.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(['Imp'],ascending=False).plot(kind='barh',figsize=[7,6])
Out[41]:
<matplotlib.axes._subplots.AxesSubplot at 0x16014c9b048>

Lets check if quadratic features can help improve the model

In [42]:
from sklearn.preprocessing import PolynomialFeatures
polynomial_features = PolynomialFeatures(degree=2)
x_poly_train = polynomial_features.fit_transform(X_train)

x_poly_test = polynomial_features.fit_transform(X_test)

fe_model_poly = train_evaluate_model(x_poly_train,y_train,x_poly_test,y_test)
fe_model_poly
[19:12:28] WARNING: C:/Jenkins/workspace/xgboost-win64_release_0.90/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Out[42]:
RMSE Train RMSE Test R2 Train R2 Test
Classifier
GradientBoostingRegressor 3.598725 5.247852 0.953490 0.901517
XGBRegressor 3.755970 5.275274 0.949337 0.900485
RandomForestRegressor 2.602591 5.664029 0.975675 0.885277
DecisionTreeRegressor 1.210579 6.935390 0.994737 0.827995
AdaBoostRegressor 6.604548 7.075141 0.843349 0.820993
LinearRegression 7.343030 14.289117 0.806358 0.269853
SVR 13.915801 14.332271 0.304552 0.265436
Ridge 7.409076 14.456720 0.802859 0.252624
Lasso 7.630299 14.841639 0.790911 0.212296

Observation

  • Linear models are not giving good performance, even with polynomial features of degree 2 for the compposite features. So, we will not consider Linear models.
  • Decision Tree, RandomForest and Adaboost seems to be highly overfitting the data.

Let us explore the data for any clusters as there appears to be multiple gaussians in the independent features

In [43]:
from sklearn.cluster import KMeans

#check the optimal k value
ks = range(1, 9)
inertias = []

for k in ks:
    model = KMeans(n_clusters=k)
    model.fit(df_temp.drop('strength',axis=1))
    inertias.append(model.inertia_)

plt.figure(figsize=(8,5))
plt.style.use('bmh')
plt.plot(ks, inertias, '-o')
plt.xlabel('Number of clusters, k')
plt.ylabel('Inertia')
plt.xticks(ks)
plt.show()
In [44]:
from scipy.stats import zscore
# df_std = df_temp.drop(['strength'],axis=1).apply(zscore)
df_std = df_temp.apply(zscore)
In [45]:
kmeans = KMeans(n_clusters=3,random_state=4)
kmeans.fit(df_std)
Out[45]:
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=4, tol=0.0001, verbose=0)
In [46]:
# Check the number of data in each cluster

labels = kmeans.labels_
counts = np.bincount(labels[labels>=0])
print(counts)
[103 227 391]
In [47]:
df_std.columns
Out[47]:
Index(['superplastic', 'age', 'strength', 'water_cement_ratio',
       'coarse_fine_agg_ratio', 'water_binder_ratio'],
      dtype='object')
In [48]:
# Distribution looks fine.

# let us check the centers in each group
centroids = kmeans.cluster_centers_
centroid_df = pd.DataFrame(centroids, columns = list(df_temp))
centroid_df.transpose()
Out[48]:
0 1 2
superplastic -0.742115 0.893720 -0.323367
age 1.968070 -0.151869 -0.430273
strength 0.487739 0.926046 -0.666112
water_cement_ratio -0.046373 -0.822992 0.490015
coarse_fine_agg_ratio 0.490020 -0.197640 -0.014342
water_binder_ratio 0.453217 -1.038539 0.483547
In [49]:
predictions = kmeans.predict(df_std)
predictions
df_std["group"] = predictions
df_std['group'] = df_std['group'].astype('category')
df_std.dtypes
Out[49]:
superplastic              float64
age                       float64
strength                  float64
water_cement_ratio        float64
coarse_fine_agg_ratio     float64
water_binder_ratio        float64
group                    category
dtype: object
In [50]:
# Visualize the centers

df_std.boxplot(by = 'group',  layout=(3,4), figsize=(15, 10))
Out[50]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000016013A00A90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000016014DBE278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000016013A04A58>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000016011B02940>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000016013464EF0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000160134644E0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000160139F3F60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000016014D73A20>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000001601504A4E0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000016015063F60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000016014E37A20>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000016014DF74E0>]],
      dtype=object)
In [51]:
# Addressing outliers at group level       
def replace(group):
    median, std = group.median(), group.std()  #Get the median and the standard deviation of every group 
    outliers = (group - median).abs() > 2*std # Subtract median from every member of each group. Take absolute values > 2std
    group[outliers] = group.median()       
    return group

data_corrected = (df_std.groupby('group').transform(replace)) 
concat_data = data_corrected.join(pd.DataFrame(df_std['group']))
In [52]:
concat_data.boxplot(by = 'group', layout=(2,4), figsize=(15, 10))
Out[52]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x00000160154CBD30>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000016015536B70>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000160151082E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001601512DAC8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000001601515E2E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001601515E320>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000160151B52E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000160151D9D68>]],
      dtype=object)

Note: When we remove outliers and replace with median or mean, the distribution shape changes, the standard deviation becomes tighter creating new outliers. The new outliers would be much closer to the centre than original outliers so we accept them without modifying them

No distinct clusters seems to be visible, lets explore further.

In [53]:
df_group0=df_std[(df_std['group']==0)]
df_group1=df_std[(df_std['group']==1)]
df_group2=df_std[(df_std['group']==2)]
In [54]:
sns.pairplot(df_group0,diag_kind='kde')
Out[54]:
<seaborn.axisgrid.PairGrid at 0x160150e7470>
In [55]:
sns.pairplot(df_group1,diag_kind='kde')
Out[55]:
<seaborn.axisgrid.PairGrid at 0x160180a8710>
In [56]:
sns.pairplot(df_group2,diag_kind='kde')
Out[56]:
<seaborn.axisgrid.PairGrid at 0x1601ac28908>
In [57]:
# ash vs strength

var = 'water_binder_ratio'

with sns.axes_style("white"):
    plot = sns.lmplot(var,'strength',data=concat_data,hue='group')
#plot.set(ylim = (-3,3))
In [58]:
# age vs strength
var = 'age'

with sns.axes_style("white"):
    plot = sns.lmplot(var,'strength',data=concat_data,hue='group')
In [59]:
# superplastic vs strength
var = 'superplastic'

with sns.axes_style("white"):
    plot = sns.lmplot(var,'strength',data=concat_data,hue='group')
In [60]:
# water_cement_ratio vs strength
var = 'water_cement_ratio'

with sns.axes_style("white"):
    plot = sns.lmplot(var,'strength',data=concat_data,hue='group')
In [61]:
# coarse_fine_agg_ratio vs strength
var = 'coarse_fine_agg_ratio'

with sns.axes_style("white"):
    plot = sns.lmplot(var,'strength',data=concat_data,hue='group')

Observation

There don't seem to be a clear seperation of clusters among features. It seems to be unlikely to get better results by splitting the data into clusters.

Lets tune GradientBoost and Xgboost models further to get a better performance

Set the range of hyperparameters to be tuned with RandonSearch for GradientBoost. Hyperparameters are the parameters that are initialized before training a model because they cannot be learned from the algorithm. They control the behavior of the training algorithm and have a high impact on the performance of a model.

In [62]:
from sklearn.model_selection import RandomizedSearchCV
import pprint

#Hyperparameters are the parameters that are initialized before training a model 
#because they cannot be learned from the algorithm. They control the behavior of the training algorithm 
#and have a high impact on the performance of a model

learning_rate = [0.01, 0.5, 0.1,1]
# Number of trees in GradientBoost
n_estimators = [int(x) for x in np.linspace(start = 100 , stop = 1000, num = 4)]   # returns evenly spaced 10 numbers

# Number of features to consider at every split
max_features = ['auto', 'sqrt']

# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(3, 10, num = 4)]  # returns evenly spaced numbers can be changed to any
max_depth.append(None)

#the fraction of observations to be used in individual tree
subsample = [0.8,0.9,1]

# Minimum number of samples required to split a node
min_samples_split = [0.5, 1.0, 2, 5, 8, 10]

# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]


# Create the random grid
rs_param_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'learning_rate':learning_rate,
                'subsample':subsample}

pprint.pprint(rs_param_grid)
{'learning_rate': [0.01, 0.5, 0.1, 1],
 'max_depth': [3, 5, 7, 10, None],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 4],
 'min_samples_split': [0.5, 1.0, 2, 5, 8, 10],
 'n_estimators': [100, 400, 700, 1000],
 'subsample': [0.8, 0.9, 1]}

Tune GBM using randomisedsearchcv

In [63]:
gbr_clf = GradientBoostingRegressor(random_state=2)
rf_random = RandomizedSearchCV(estimator=gbr_clf, param_distributions=rs_param_grid,
                              n_iter = 10, scoring='neg_mean_squared_error', 
                              cv = 5, verbose=2, random_state=42, n_jobs=-1,
                              return_train_score=True)

# Fit the random search model
rf_random.fit(X_train, y_train)
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  3.0min
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:  3.2min finished
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.
  DeprecationWarning)
Out[63]:
RandomizedSearchCV(cv=5, error_score='raise-deprecating',
                   estimator=GradientBoostingRegressor(alpha=0.9,
                                                       criterion='friedman_mse',
                                                       init=None,
                                                       learning_rate=0.1,
                                                       loss='ls', max_depth=3,
                                                       max_features=None,
                                                       max_leaf_nodes=None,
                                                       min_impurity_decrease=0.0,
                                                       min_impurity_split=None,
                                                       min_samples_leaf=1,
                                                       min_samples_split=2,
                                                       min_weight_fraction_leaf=0.0,
                                                       n_estimators=100,...
                   param_distributions={'learning_rate': [0.01, 0.5, 0.1, 1],
                                        'max_depth': [3, 5, 7, 10, None],
                                        'max_features': ['auto', 'sqrt'],
                                        'min_samples_leaf': [1, 2, 4],
                                        'min_samples_split': [0.5, 1.0, 2, 5, 8,
                                                              10],
                                        'n_estimators': [100, 400, 700, 1000],
                                        'subsample': [0.8, 0.9, 1]},
                   pre_dispatch='2*n_jobs', random_state=42, refit=True,
                   return_train_score=True, scoring='neg_mean_squared_error',
                   verbose=2)
In [64]:
rf_random.best_params_
Out[64]:
{'subsample': 0.9,
 'n_estimators': 1000,
 'min_samples_split': 2,
 'min_samples_leaf': 2,
 'max_features': 'auto',
 'max_depth': 3,
 'learning_rate': 0.1}

Evaluate the score of GBM by RandomisedSearchCV

In [65]:
best_random = rf_random.best_estimator_

best_random.score(X_test , y_test)
Out[65]:
0.9232710755331391

Set the hyperparameters in the range closer to the one output by RandomizedSearchCV, to tune using GrinSearchCV

In [66]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'subsample': [0.8,0.85,0.9,0.95,1],
     'n_estimators': [800,900,1000,1200],
     'min_samples_split': [2,4,6],
     'min_samples_leaf': [2],
     'max_features': ['auto','sqrt'],
     'max_depth': [3,5],
     'learning_rate': [0.15, 0.1,0.05,0.01]
}    

Tune the GBM model using GridSearchCV

In [67]:
grid_search = GridSearchCV(estimator = gbr_clf, param_grid = param_grid, 
                          cv = 5, n_jobs = -1, verbose = 2, return_train_score=True,scoring='neg_mean_squared_error')
In [68]:
# Fit the grid search to the data
grid_search.fit(X_train, y_train);
Fitting 5 folds for each of 960 candidates, totalling 4800 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:   39.4s
[Parallel(n_jobs=-1)]: Done 357 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done 640 tasks      | elapsed:  2.8min
[Parallel(n_jobs=-1)]: Done 1005 tasks      | elapsed:  4.9min
[Parallel(n_jobs=-1)]: Done 1450 tasks      | elapsed:  7.3min
[Parallel(n_jobs=-1)]: Done 1977 tasks      | elapsed: 10.0min
[Parallel(n_jobs=-1)]: Done 2584 tasks      | elapsed: 13.4min
[Parallel(n_jobs=-1)]: Done 3273 tasks      | elapsed: 17.4min
[Parallel(n_jobs=-1)]: Done 4042 tasks      | elapsed: 21.7min
[Parallel(n_jobs=-1)]: Done 4800 out of 4800 | elapsed: 26.9min finished
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.
  DeprecationWarning)
In [69]:
grid_search.best_params_
Out[69]:
{'learning_rate': 0.05,
 'max_depth': 3,
 'max_features': 'sqrt',
 'min_samples_leaf': 2,
 'min_samples_split': 2,
 'n_estimators': 1200,
 'subsample': 1}

Evaluate the score for GridSerachCV

In [70]:
best_grid = grid_search.best_estimator_
best_score = best_grid.score(X_test, y_test)
best_score
Out[70]:
0.924390023043305
In [71]:
best_grid
Out[71]:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                          learning_rate=0.05, loss='ls', max_depth=3,
                          max_features='sqrt', max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=2, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=1200,
                          n_iter_no_change=None, presort='auto', random_state=2,
                          subsample=1, tol=0.0001, validation_fraction=0.1,
                          verbose=0, warm_start=False)

Save the model to a file, for later use.

In [72]:
import pickle
filename = 'concrete_model.sav'
pickle.dump(best_grid,open(filename,'wb'))

Let's try to tune Xgboost parameter and see if we get better performance than GBM

In [73]:
xgb_model = xgboost.XGBRegressor(learning_rate=0.01,  
                      colsample_bytree = 1,
                      subsample = 0.8,
                      objective='reg:squarederror', 
                      n_estimators=1000, 
                      max_depth=3, 
                      gamma=1)
In [74]:
eval_set = [(X_train, y_train), (X_test, y_test)]
%time xgb_model.fit(X_train, y_train, eval_metric='rmse', eval_set=eval_set, verbose=True)
evals_result = xgb_model.evals_result()
[0]	validation_0-rmse:38.7193	validation_1-rmse:38.6715
[1]	validation_0-rmse:38.3571	validation_1-rmse:38.313
[2]	validation_0-rmse:38.0009	validation_1-rmse:37.9691
[3]	validation_0-rmse:37.6448	validation_1-rmse:37.6191
[4]	validation_0-rmse:37.2935	validation_1-rmse:37.2792
[5]	validation_0-rmse:36.9479	validation_1-rmse:36.9404
[6]	validation_0-rmse:36.604	validation_1-rmse:36.6025
[7]	validation_0-rmse:36.2626	validation_1-rmse:36.2703
[8]	validation_0-rmse:35.9269	validation_1-rmse:35.9427
[9]	validation_0-rmse:35.5937	validation_1-rmse:35.6146
[10]	validation_0-rmse:35.2657	validation_1-rmse:35.3023
[11]	validation_0-rmse:34.941	validation_1-rmse:34.987
[12]	validation_0-rmse:34.6214	validation_1-rmse:34.6795
[13]	validation_0-rmse:34.3034	validation_1-rmse:34.3672
[14]	validation_0-rmse:33.9896	validation_1-rmse:34.0689
[15]	validation_0-rmse:33.6742	validation_1-rmse:33.7593
[16]	validation_0-rmse:33.3695	validation_1-rmse:33.4665
[17]	validation_0-rmse:33.0633	validation_1-rmse:33.1683
[18]	validation_0-rmse:32.7582	validation_1-rmse:32.8684
[19]	validation_0-rmse:32.4583	validation_1-rmse:32.5713
[20]	validation_0-rmse:32.1606	validation_1-rmse:32.2803
[21]	validation_0-rmse:31.8706	validation_1-rmse:32.0073
[22]	validation_0-rmse:31.5787	validation_1-rmse:31.7164
[23]	validation_0-rmse:31.2892	validation_1-rmse:31.4339
[24]	validation_0-rmse:31.0059	validation_1-rmse:31.1581
[25]	validation_0-rmse:30.7202	validation_1-rmse:30.876
[26]	validation_0-rmse:30.4415	validation_1-rmse:30.5996
[27]	validation_0-rmse:30.1649	validation_1-rmse:30.3376
[28]	validation_0-rmse:29.8904	validation_1-rmse:30.0694
[29]	validation_0-rmse:29.6204	validation_1-rmse:29.8148
[30]	validation_0-rmse:29.3496	validation_1-rmse:29.5508
[31]	validation_0-rmse:29.0851	validation_1-rmse:29.2979
[32]	validation_0-rmse:28.82	validation_1-rmse:29.0371
[33]	validation_0-rmse:28.5608	validation_1-rmse:28.7869
[34]	validation_0-rmse:28.303	validation_1-rmse:28.5302
[35]	validation_0-rmse:28.0489	validation_1-rmse:28.2833
[36]	validation_0-rmse:27.7983	validation_1-rmse:28.036
[37]	validation_0-rmse:27.5497	validation_1-rmse:27.7925
[38]	validation_0-rmse:27.3018	validation_1-rmse:27.548
[39]	validation_0-rmse:27.0573	validation_1-rmse:27.306
[40]	validation_0-rmse:26.811	validation_1-rmse:27.0612
[41]	validation_0-rmse:26.5693	validation_1-rmse:26.8194
[42]	validation_0-rmse:26.3322	validation_1-rmse:26.585
[43]	validation_0-rmse:26.0978	validation_1-rmse:26.3534
[44]	validation_0-rmse:25.8663	validation_1-rmse:26.1253
[45]	validation_0-rmse:25.6343	validation_1-rmse:25.8976
[46]	validation_0-rmse:25.4072	validation_1-rmse:25.6759
[47]	validation_0-rmse:25.1814	validation_1-rmse:25.4544
[48]	validation_0-rmse:24.9578	validation_1-rmse:25.23
[49]	validation_0-rmse:24.7359	validation_1-rmse:25.015
[50]	validation_0-rmse:24.5176	validation_1-rmse:24.7979
[51]	validation_0-rmse:24.301	validation_1-rmse:24.5856
[52]	validation_0-rmse:24.0871	validation_1-rmse:24.372
[53]	validation_0-rmse:23.8741	validation_1-rmse:24.1599
[54]	validation_0-rmse:23.6665	validation_1-rmse:23.9546
[55]	validation_0-rmse:23.4616	validation_1-rmse:23.7541
[56]	validation_0-rmse:23.2547	validation_1-rmse:23.5487
[57]	validation_0-rmse:23.0529	validation_1-rmse:23.3518
[58]	validation_0-rmse:22.8493	validation_1-rmse:23.1449
[59]	validation_0-rmse:22.6514	validation_1-rmse:22.9465
[60]	validation_0-rmse:22.4541	validation_1-rmse:22.753
[61]	validation_0-rmse:22.2575	validation_1-rmse:22.5567
[62]	validation_0-rmse:22.067	validation_1-rmse:22.3691
[63]	validation_0-rmse:21.8739	validation_1-rmse:22.1742
[64]	validation_0-rmse:21.6836	validation_1-rmse:21.9842
[65]	validation_0-rmse:21.4951	validation_1-rmse:21.7964
[66]	validation_0-rmse:21.3085	validation_1-rmse:21.6129
[67]	validation_0-rmse:21.1257	validation_1-rmse:21.4338
[68]	validation_0-rmse:20.9449	validation_1-rmse:21.2571
[69]	validation_0-rmse:20.7656	validation_1-rmse:21.0801
[70]	validation_0-rmse:20.5899	validation_1-rmse:20.911
[71]	validation_0-rmse:20.4177	validation_1-rmse:20.7446
[72]	validation_0-rmse:20.2443	validation_1-rmse:20.5749
[73]	validation_0-rmse:20.0705	validation_1-rmse:20.4029
[74]	validation_0-rmse:19.8995	validation_1-rmse:20.2328
[75]	validation_0-rmse:19.7295	validation_1-rmse:20.0664
[76]	validation_0-rmse:19.5628	validation_1-rmse:19.9035
[77]	validation_0-rmse:19.3992	validation_1-rmse:19.736
[78]	validation_0-rmse:19.2353	validation_1-rmse:19.575
[79]	validation_0-rmse:19.0689	validation_1-rmse:19.4129
[80]	validation_0-rmse:18.9088	validation_1-rmse:19.254
[81]	validation_0-rmse:18.7512	validation_1-rmse:19.1014
[82]	validation_0-rmse:18.5964	validation_1-rmse:18.9506
[83]	validation_0-rmse:18.4429	validation_1-rmse:18.7989
[84]	validation_0-rmse:18.2899	validation_1-rmse:18.6486
[85]	validation_0-rmse:18.1386	validation_1-rmse:18.4983
[86]	validation_0-rmse:17.9887	validation_1-rmse:18.3489
[87]	validation_0-rmse:17.8384	validation_1-rmse:18.1978
[88]	validation_0-rmse:17.6951	validation_1-rmse:18.0567
[89]	validation_0-rmse:17.5492	validation_1-rmse:17.9147
[90]	validation_0-rmse:17.4062	validation_1-rmse:17.7716
[91]	validation_0-rmse:17.2653	validation_1-rmse:17.633
[92]	validation_0-rmse:17.1273	validation_1-rmse:17.4968
[93]	validation_0-rmse:16.9862	validation_1-rmse:17.3559
[94]	validation_0-rmse:16.8477	validation_1-rmse:17.2181
[95]	validation_0-rmse:16.7134	validation_1-rmse:17.0876
[96]	validation_0-rmse:16.5804	validation_1-rmse:16.9551
[97]	validation_0-rmse:16.4492	validation_1-rmse:16.8224
[98]	validation_0-rmse:16.3179	validation_1-rmse:16.6894
[99]	validation_0-rmse:16.1869	validation_1-rmse:16.5602
[100]	validation_0-rmse:16.0606	validation_1-rmse:16.4368
[101]	validation_0-rmse:15.9331	validation_1-rmse:16.3124
[102]	validation_0-rmse:15.8059	validation_1-rmse:16.1885
[103]	validation_0-rmse:15.6814	validation_1-rmse:16.0683
[104]	validation_0-rmse:15.557	validation_1-rmse:15.9437
[105]	validation_0-rmse:15.4336	validation_1-rmse:15.8183
[106]	validation_0-rmse:15.3119	validation_1-rmse:15.6984
[107]	validation_0-rmse:15.191	validation_1-rmse:15.5786
[108]	validation_0-rmse:15.0739	validation_1-rmse:15.4622
[109]	validation_0-rmse:14.9597	validation_1-rmse:15.3477
[110]	validation_0-rmse:14.8442	validation_1-rmse:15.2306
[111]	validation_0-rmse:14.7308	validation_1-rmse:15.12
[112]	validation_0-rmse:14.6174	validation_1-rmse:15.0077
[113]	validation_0-rmse:14.5082	validation_1-rmse:14.8985
[114]	validation_0-rmse:14.3978	validation_1-rmse:14.7896
[115]	validation_0-rmse:14.2878	validation_1-rmse:14.6824
[116]	validation_0-rmse:14.177	validation_1-rmse:14.5713
[117]	validation_0-rmse:14.0702	validation_1-rmse:14.4672
[118]	validation_0-rmse:13.9635	validation_1-rmse:14.361
[119]	validation_0-rmse:13.86	validation_1-rmse:14.2594
[120]	validation_0-rmse:13.7554	validation_1-rmse:14.1536
[121]	validation_0-rmse:13.6535	validation_1-rmse:14.0534
[122]	validation_0-rmse:13.5541	validation_1-rmse:13.9511
[123]	validation_0-rmse:13.4531	validation_1-rmse:13.8508
[124]	validation_0-rmse:13.3537	validation_1-rmse:13.7526
[125]	validation_0-rmse:13.2562	validation_1-rmse:13.6552
[126]	validation_0-rmse:13.1603	validation_1-rmse:13.561
[127]	validation_0-rmse:13.0667	validation_1-rmse:13.4667
[128]	validation_0-rmse:12.9714	validation_1-rmse:13.3695
[129]	validation_0-rmse:12.8768	validation_1-rmse:13.2702
[130]	validation_0-rmse:12.7846	validation_1-rmse:13.176
[131]	validation_0-rmse:12.6921	validation_1-rmse:13.0805
[132]	validation_0-rmse:12.6023	validation_1-rmse:12.9916
[133]	validation_0-rmse:12.5135	validation_1-rmse:12.9046
[134]	validation_0-rmse:12.4269	validation_1-rmse:12.8196
[135]	validation_0-rmse:12.3417	validation_1-rmse:12.7333
[136]	validation_0-rmse:12.2552	validation_1-rmse:12.6441
[137]	validation_0-rmse:12.1691	validation_1-rmse:12.5573
[138]	validation_0-rmse:12.0851	validation_1-rmse:12.4727
[139]	validation_0-rmse:12.0033	validation_1-rmse:12.3871
[140]	validation_0-rmse:11.9219	validation_1-rmse:12.3031
[141]	validation_0-rmse:11.8413	validation_1-rmse:12.2217
[142]	validation_0-rmse:11.7616	validation_1-rmse:12.142
[143]	validation_0-rmse:11.6821	validation_1-rmse:12.0624
[144]	validation_0-rmse:11.6033	validation_1-rmse:11.9823
[145]	validation_0-rmse:11.5215	validation_1-rmse:11.8991
[146]	validation_0-rmse:11.4436	validation_1-rmse:11.8218
[147]	validation_0-rmse:11.3675	validation_1-rmse:11.7433
[148]	validation_0-rmse:11.2939	validation_1-rmse:11.6692
[149]	validation_0-rmse:11.2216	validation_1-rmse:11.5981
[150]	validation_0-rmse:11.148	validation_1-rmse:11.5252
[151]	validation_0-rmse:11.0736	validation_1-rmse:11.4504
[152]	validation_0-rmse:11.0009	validation_1-rmse:11.3741
[153]	validation_0-rmse:10.9309	validation_1-rmse:11.3037
[154]	validation_0-rmse:10.8624	validation_1-rmse:11.2335
[155]	validation_0-rmse:10.7921	validation_1-rmse:11.1602
[156]	validation_0-rmse:10.7254	validation_1-rmse:11.0918
[157]	validation_0-rmse:10.6573	validation_1-rmse:11.0205
[158]	validation_0-rmse:10.5908	validation_1-rmse:10.9514
[159]	validation_0-rmse:10.5239	validation_1-rmse:10.8824
[160]	validation_0-rmse:10.4598	validation_1-rmse:10.8188
[161]	validation_0-rmse:10.3974	validation_1-rmse:10.7571
[162]	validation_0-rmse:10.3325	validation_1-rmse:10.6896
[163]	validation_0-rmse:10.2697	validation_1-rmse:10.6262
[164]	validation_0-rmse:10.2071	validation_1-rmse:10.5631
[165]	validation_0-rmse:10.147	validation_1-rmse:10.5033
[166]	validation_0-rmse:10.0866	validation_1-rmse:10.4415
[167]	validation_0-rmse:10.0275	validation_1-rmse:10.3828
[168]	validation_0-rmse:9.96784	validation_1-rmse:10.3217
[169]	validation_0-rmse:9.90706	validation_1-rmse:10.2587
[170]	validation_0-rmse:9.8498	validation_1-rmse:10.1973
[171]	validation_0-rmse:9.79342	validation_1-rmse:10.1424
[172]	validation_0-rmse:9.73698	validation_1-rmse:10.0861
[173]	validation_0-rmse:9.68074	validation_1-rmse:10.0285
[174]	validation_0-rmse:9.6239	validation_1-rmse:9.97026
[175]	validation_0-rmse:9.57115	validation_1-rmse:9.91702
[176]	validation_0-rmse:9.51787	validation_1-rmse:9.86404
[177]	validation_0-rmse:9.46642	validation_1-rmse:9.8115
[178]	validation_0-rmse:9.41442	validation_1-rmse:9.75946
[179]	validation_0-rmse:9.36382	validation_1-rmse:9.70733
[180]	validation_0-rmse:9.31295	validation_1-rmse:9.65441
[181]	validation_0-rmse:9.26258	validation_1-rmse:9.60501
[182]	validation_0-rmse:9.20976	validation_1-rmse:9.5547
[183]	validation_0-rmse:9.16082	validation_1-rmse:9.50269
[184]	validation_0-rmse:9.10955	validation_1-rmse:9.44679
[185]	validation_0-rmse:9.06123	validation_1-rmse:9.39836
[186]	validation_0-rmse:9.01061	validation_1-rmse:9.3454
[187]	validation_0-rmse:8.96335	validation_1-rmse:9.29835
[188]	validation_0-rmse:8.91773	validation_1-rmse:9.25215
[189]	validation_0-rmse:8.86935	validation_1-rmse:9.202
[190]	validation_0-rmse:8.82239	validation_1-rmse:9.15438
[191]	validation_0-rmse:8.77878	validation_1-rmse:9.11276
[192]	validation_0-rmse:8.73158	validation_1-rmse:9.06335
[193]	validation_0-rmse:8.68857	validation_1-rmse:9.01789
[194]	validation_0-rmse:8.64545	validation_1-rmse:8.97377
[195]	validation_0-rmse:8.60441	validation_1-rmse:8.93068
[196]	validation_0-rmse:8.56285	validation_1-rmse:8.88677
[197]	validation_0-rmse:8.52313	validation_1-rmse:8.84615
[198]	validation_0-rmse:8.48318	validation_1-rmse:8.80376
[199]	validation_0-rmse:8.44356	validation_1-rmse:8.76331
[200]	validation_0-rmse:8.40472	validation_1-rmse:8.72438
[201]	validation_0-rmse:8.36478	validation_1-rmse:8.68076
[202]	validation_0-rmse:8.32411	validation_1-rmse:8.63665
[203]	validation_0-rmse:8.28562	validation_1-rmse:8.59609
[204]	validation_0-rmse:8.24641	validation_1-rmse:8.55465
[205]	validation_0-rmse:8.20942	validation_1-rmse:8.51598
[206]	validation_0-rmse:8.17327	validation_1-rmse:8.47988
[207]	validation_0-rmse:8.13629	validation_1-rmse:8.4428
[208]	validation_0-rmse:8.1018	validation_1-rmse:8.40642
[209]	validation_0-rmse:8.06516	validation_1-rmse:8.37131
[210]	validation_0-rmse:8.02844	validation_1-rmse:8.33639
[211]	validation_0-rmse:7.99031	validation_1-rmse:8.29636
[212]	validation_0-rmse:7.9576	validation_1-rmse:8.26295
[213]	validation_0-rmse:7.92178	validation_1-rmse:8.22415
[214]	validation_0-rmse:7.88932	validation_1-rmse:8.18978
[215]	validation_0-rmse:7.85707	validation_1-rmse:8.15607
[216]	validation_0-rmse:7.82489	validation_1-rmse:8.12434
[217]	validation_0-rmse:7.79375	validation_1-rmse:8.09282
[218]	validation_0-rmse:7.76125	validation_1-rmse:8.05796
[219]	validation_0-rmse:7.72987	validation_1-rmse:8.02505
[220]	validation_0-rmse:7.70052	validation_1-rmse:7.99385
[221]	validation_0-rmse:7.66917	validation_1-rmse:7.96154
[222]	validation_0-rmse:7.63864	validation_1-rmse:7.93178
[223]	validation_0-rmse:7.60865	validation_1-rmse:7.89989
[224]	validation_0-rmse:7.57918	validation_1-rmse:7.86974
[225]	validation_0-rmse:7.55106	validation_1-rmse:7.84121
[226]	validation_0-rmse:7.52152	validation_1-rmse:7.81231
[227]	validation_0-rmse:7.49291	validation_1-rmse:7.7804
[228]	validation_0-rmse:7.46669	validation_1-rmse:7.755
[229]	validation_0-rmse:7.43739	validation_1-rmse:7.7228
[230]	validation_0-rmse:7.41104	validation_1-rmse:7.69582
[231]	validation_0-rmse:7.3831	validation_1-rmse:7.66755
[232]	validation_0-rmse:7.35576	validation_1-rmse:7.64009
[233]	validation_0-rmse:7.33033	validation_1-rmse:7.61389
[234]	validation_0-rmse:7.30494	validation_1-rmse:7.58814
[235]	validation_0-rmse:7.27937	validation_1-rmse:7.56095
[236]	validation_0-rmse:7.25521	validation_1-rmse:7.53507
[237]	validation_0-rmse:7.23137	validation_1-rmse:7.50873
[238]	validation_0-rmse:7.20657	validation_1-rmse:7.48207
[239]	validation_0-rmse:7.18196	validation_1-rmse:7.45662
[240]	validation_0-rmse:7.16023	validation_1-rmse:7.43452
[241]	validation_0-rmse:7.13664	validation_1-rmse:7.41121
[242]	validation_0-rmse:7.11461	validation_1-rmse:7.38861
[243]	validation_0-rmse:7.09339	validation_1-rmse:7.36773
[244]	validation_0-rmse:7.0719	validation_1-rmse:7.34578
[245]	validation_0-rmse:7.04619	validation_1-rmse:7.31896
[246]	validation_0-rmse:7.02333	validation_1-rmse:7.29592
[247]	validation_0-rmse:7.00217	validation_1-rmse:7.27309
[248]	validation_0-rmse:6.97945	validation_1-rmse:7.24661
[249]	validation_0-rmse:6.95863	validation_1-rmse:7.22501
[250]	validation_0-rmse:6.93859	validation_1-rmse:7.20367
[251]	validation_0-rmse:6.91933	validation_1-rmse:7.18508
[252]	validation_0-rmse:6.89743	validation_1-rmse:7.16132
[253]	validation_0-rmse:6.87711	validation_1-rmse:7.14044
[254]	validation_0-rmse:6.85651	validation_1-rmse:7.11939
[255]	validation_0-rmse:6.83823	validation_1-rmse:7.10215
[256]	validation_0-rmse:6.81756	validation_1-rmse:7.08
[257]	validation_0-rmse:6.79865	validation_1-rmse:7.05918
[258]	validation_0-rmse:6.7798	validation_1-rmse:7.03848
[259]	validation_0-rmse:6.76179	validation_1-rmse:7.01971
[260]	validation_0-rmse:6.7419	validation_1-rmse:6.99968
[261]	validation_0-rmse:6.72429	validation_1-rmse:6.98108
[262]	validation_0-rmse:6.70479	validation_1-rmse:6.9595
[263]	validation_0-rmse:6.68497	validation_1-rmse:6.93926
[264]	validation_0-rmse:6.66805	validation_1-rmse:6.92138
[265]	validation_0-rmse:6.64996	validation_1-rmse:6.90325
[266]	validation_0-rmse:6.63311	validation_1-rmse:6.88699
[267]	validation_0-rmse:6.61878	validation_1-rmse:6.87104
[268]	validation_0-rmse:6.60226	validation_1-rmse:6.85572
[269]	validation_0-rmse:6.58591	validation_1-rmse:6.83992
[270]	validation_0-rmse:6.56895	validation_1-rmse:6.8231
[271]	validation_0-rmse:6.55115	validation_1-rmse:6.80627
[272]	validation_0-rmse:6.53285	validation_1-rmse:6.78809
[273]	validation_0-rmse:6.51666	validation_1-rmse:6.77358
[274]	validation_0-rmse:6.50167	validation_1-rmse:6.75679
[275]	validation_0-rmse:6.4851	validation_1-rmse:6.74014
[276]	validation_0-rmse:6.47017	validation_1-rmse:6.72436
[277]	validation_0-rmse:6.45339	validation_1-rmse:6.70635
[278]	validation_0-rmse:6.43728	validation_1-rmse:6.6878
[279]	validation_0-rmse:6.42283	validation_1-rmse:6.67182
[280]	validation_0-rmse:6.40891	validation_1-rmse:6.65699
[281]	validation_0-rmse:6.3946	validation_1-rmse:6.64276
[282]	validation_0-rmse:6.38024	validation_1-rmse:6.62742
[283]	validation_0-rmse:6.36492	validation_1-rmse:6.61064
[284]	validation_0-rmse:6.35152	validation_1-rmse:6.59663
[285]	validation_0-rmse:6.33749	validation_1-rmse:6.58188
[286]	validation_0-rmse:6.3246	validation_1-rmse:6.56877
[287]	validation_0-rmse:6.31171	validation_1-rmse:6.55711
[288]	validation_0-rmse:6.29931	validation_1-rmse:6.54415
[289]	validation_0-rmse:6.28545	validation_1-rmse:6.52982
[290]	validation_0-rmse:6.27091	validation_1-rmse:6.51321
[291]	validation_0-rmse:6.2578	validation_1-rmse:6.50082
[292]	validation_0-rmse:6.24479	validation_1-rmse:6.48766
[293]	validation_0-rmse:6.2331	validation_1-rmse:6.47567
[294]	validation_0-rmse:6.22185	validation_1-rmse:6.46447
[295]	validation_0-rmse:6.20859	validation_1-rmse:6.4514
[296]	validation_0-rmse:6.19642	validation_1-rmse:6.43875
[297]	validation_0-rmse:6.18307	validation_1-rmse:6.42461
[298]	validation_0-rmse:6.17004	validation_1-rmse:6.41032
[299]	validation_0-rmse:6.15984	validation_1-rmse:6.39912
[300]	validation_0-rmse:6.14776	validation_1-rmse:6.38811
[301]	validation_0-rmse:6.13479	validation_1-rmse:6.3732
[302]	validation_0-rmse:6.12397	validation_1-rmse:6.36277
[303]	validation_0-rmse:6.11311	validation_1-rmse:6.35163
[304]	validation_0-rmse:6.10124	validation_1-rmse:6.34085
[305]	validation_0-rmse:6.09158	validation_1-rmse:6.3301
[306]	validation_0-rmse:6.08055	validation_1-rmse:6.32046
[307]	validation_0-rmse:6.07065	validation_1-rmse:6.30985
[308]	validation_0-rmse:6.0589	validation_1-rmse:6.29817
[309]	validation_0-rmse:6.04969	validation_1-rmse:6.28771
[310]	validation_0-rmse:6.03614	validation_1-rmse:6.27507
[311]	validation_0-rmse:6.02544	validation_1-rmse:6.26365
[312]	validation_0-rmse:6.01418	validation_1-rmse:6.25241
[313]	validation_0-rmse:6.00215	validation_1-rmse:6.23961
[314]	validation_0-rmse:5.99211	validation_1-rmse:6.22916
[315]	validation_0-rmse:5.97971	validation_1-rmse:6.21811
[316]	validation_0-rmse:5.9701	validation_1-rmse:6.2078
[317]	validation_0-rmse:5.9606	validation_1-rmse:6.19832
[318]	validation_0-rmse:5.94997	validation_1-rmse:6.18806
[319]	validation_0-rmse:5.9406	validation_1-rmse:6.17888
[320]	validation_0-rmse:5.92971	validation_1-rmse:6.16888
[321]	validation_0-rmse:5.91864	validation_1-rmse:6.15879
[322]	validation_0-rmse:5.91052	validation_1-rmse:6.15098
[323]	validation_0-rmse:5.90119	validation_1-rmse:6.14294
[324]	validation_0-rmse:5.89305	validation_1-rmse:6.13492
[325]	validation_0-rmse:5.88302	validation_1-rmse:6.12413
[326]	validation_0-rmse:5.87541	validation_1-rmse:6.1162
[327]	validation_0-rmse:5.86698	validation_1-rmse:6.10891
[328]	validation_0-rmse:5.85655	validation_1-rmse:6.09729
[329]	validation_0-rmse:5.84687	validation_1-rmse:6.08738
[330]	validation_0-rmse:5.83776	validation_1-rmse:6.07785
[331]	validation_0-rmse:5.83005	validation_1-rmse:6.06999
[332]	validation_0-rmse:5.82007	validation_1-rmse:6.05909
[333]	validation_0-rmse:5.8129	validation_1-rmse:6.05205
[334]	validation_0-rmse:5.80082	validation_1-rmse:6.03958
[335]	validation_0-rmse:5.78972	validation_1-rmse:6.03124
[336]	validation_0-rmse:5.78272	validation_1-rmse:6.02428
[337]	validation_0-rmse:5.77476	validation_1-rmse:6.01606
[338]	validation_0-rmse:5.76539	validation_1-rmse:6.00844
[339]	validation_0-rmse:5.75847	validation_1-rmse:6.00159
[340]	validation_0-rmse:5.75056	validation_1-rmse:5.99517
[341]	validation_0-rmse:5.7435	validation_1-rmse:5.98901
[342]	validation_0-rmse:5.73417	validation_1-rmse:5.98267
[343]	validation_0-rmse:5.72694	validation_1-rmse:5.97584
[344]	validation_0-rmse:5.72002	validation_1-rmse:5.96933
[345]	validation_0-rmse:5.71174	validation_1-rmse:5.96055
[346]	validation_0-rmse:5.70458	validation_1-rmse:5.95355
[347]	validation_0-rmse:5.69544	validation_1-rmse:5.94509
[348]	validation_0-rmse:5.68816	validation_1-rmse:5.93773
[349]	validation_0-rmse:5.68173	validation_1-rmse:5.93104
[350]	validation_0-rmse:5.6744	validation_1-rmse:5.92378
[351]	validation_0-rmse:5.66815	validation_1-rmse:5.918
[352]	validation_0-rmse:5.65927	validation_1-rmse:5.90864
[353]	validation_0-rmse:5.65174	validation_1-rmse:5.9026
[354]	validation_0-rmse:5.64139	validation_1-rmse:5.89464
[355]	validation_0-rmse:5.63241	validation_1-rmse:5.88749
[356]	validation_0-rmse:5.6261	validation_1-rmse:5.8823
[357]	validation_0-rmse:5.61861	validation_1-rmse:5.87455
[358]	validation_0-rmse:5.61053	validation_1-rmse:5.8668
[359]	validation_0-rmse:5.60265	validation_1-rmse:5.85966
[360]	validation_0-rmse:5.59516	validation_1-rmse:5.85264
[361]	validation_0-rmse:5.58874	validation_1-rmse:5.84693
[362]	validation_0-rmse:5.5837	validation_1-rmse:5.8417
[363]	validation_0-rmse:5.57459	validation_1-rmse:5.83404
[364]	validation_0-rmse:5.56445	validation_1-rmse:5.82664
[365]	validation_0-rmse:5.55884	validation_1-rmse:5.82067
[366]	validation_0-rmse:5.54899	validation_1-rmse:5.81317
[367]	validation_0-rmse:5.54278	validation_1-rmse:5.80626
[368]	validation_0-rmse:5.53642	validation_1-rmse:5.80122
[369]	validation_0-rmse:5.53208	validation_1-rmse:5.79767
[370]	validation_0-rmse:5.52317	validation_1-rmse:5.78977
[371]	validation_0-rmse:5.51768	validation_1-rmse:5.78467
[372]	validation_0-rmse:5.51389	validation_1-rmse:5.78061
[373]	validation_0-rmse:5.50713	validation_1-rmse:5.7746
[374]	validation_0-rmse:5.50184	validation_1-rmse:5.76996
[375]	validation_0-rmse:5.49648	validation_1-rmse:5.76562
[376]	validation_0-rmse:5.49052	validation_1-rmse:5.76091
[377]	validation_0-rmse:5.48289	validation_1-rmse:5.75489
[378]	validation_0-rmse:5.47675	validation_1-rmse:5.74906
[379]	validation_0-rmse:5.46854	validation_1-rmse:5.74276
[380]	validation_0-rmse:5.46442	validation_1-rmse:5.73884
[381]	validation_0-rmse:5.45943	validation_1-rmse:5.73379
[382]	validation_0-rmse:5.45259	validation_1-rmse:5.72741
[383]	validation_0-rmse:5.4471	validation_1-rmse:5.72335
[384]	validation_0-rmse:5.44355	validation_1-rmse:5.71983
[385]	validation_0-rmse:5.43671	validation_1-rmse:5.71249
[386]	validation_0-rmse:5.4294	validation_1-rmse:5.70639
[387]	validation_0-rmse:5.42373	validation_1-rmse:5.70114
[388]	validation_0-rmse:5.41744	validation_1-rmse:5.6969
[389]	validation_0-rmse:5.41285	validation_1-rmse:5.69384
[390]	validation_0-rmse:5.40821	validation_1-rmse:5.68897
[391]	validation_0-rmse:5.40361	validation_1-rmse:5.68403
[392]	validation_0-rmse:5.39685	validation_1-rmse:5.67638
[393]	validation_0-rmse:5.38989	validation_1-rmse:5.67106
[394]	validation_0-rmse:5.38237	validation_1-rmse:5.66467
[395]	validation_0-rmse:5.37674	validation_1-rmse:5.66049
[396]	validation_0-rmse:5.37073	validation_1-rmse:5.65745
[397]	validation_0-rmse:5.36395	validation_1-rmse:5.65233
[398]	validation_0-rmse:5.35764	validation_1-rmse:5.64723
[399]	validation_0-rmse:5.35364	validation_1-rmse:5.64412
[400]	validation_0-rmse:5.34947	validation_1-rmse:5.64016
[401]	validation_0-rmse:5.34514	validation_1-rmse:5.63615
[402]	validation_0-rmse:5.34077	validation_1-rmse:5.63314
[403]	validation_0-rmse:5.33509	validation_1-rmse:5.62898
[404]	validation_0-rmse:5.33009	validation_1-rmse:5.62498
[405]	validation_0-rmse:5.32433	validation_1-rmse:5.62096
[406]	validation_0-rmse:5.31876	validation_1-rmse:5.61453
[407]	validation_0-rmse:5.31485	validation_1-rmse:5.6117
[408]	validation_0-rmse:5.3119	validation_1-rmse:5.60914
[409]	validation_0-rmse:5.30854	validation_1-rmse:5.60553
[410]	validation_0-rmse:5.30178	validation_1-rmse:5.60139
[411]	validation_0-rmse:5.29775	validation_1-rmse:5.59704
[412]	validation_0-rmse:5.29342	validation_1-rmse:5.59299
[413]	validation_0-rmse:5.28704	validation_1-rmse:5.58801
[414]	validation_0-rmse:5.28237	validation_1-rmse:5.58388
[415]	validation_0-rmse:5.2764	validation_1-rmse:5.57949
[416]	validation_0-rmse:5.27227	validation_1-rmse:5.57687
[417]	validation_0-rmse:5.26833	validation_1-rmse:5.57571
[418]	validation_0-rmse:5.26537	validation_1-rmse:5.5734
[419]	validation_0-rmse:5.26153	validation_1-rmse:5.56967
[420]	validation_0-rmse:5.2569	validation_1-rmse:5.56628
[421]	validation_0-rmse:5.25199	validation_1-rmse:5.56135
[422]	validation_0-rmse:5.2475	validation_1-rmse:5.55721
[423]	validation_0-rmse:5.24157	validation_1-rmse:5.55282
[424]	validation_0-rmse:5.23617	validation_1-rmse:5.54825
[425]	validation_0-rmse:5.23188	validation_1-rmse:5.54489
[426]	validation_0-rmse:5.2276	validation_1-rmse:5.54229
[427]	validation_0-rmse:5.22477	validation_1-rmse:5.53961
[428]	validation_0-rmse:5.22046	validation_1-rmse:5.53482
[429]	validation_0-rmse:5.21646	validation_1-rmse:5.53164
[430]	validation_0-rmse:5.21353	validation_1-rmse:5.5298
[431]	validation_0-rmse:5.21118	validation_1-rmse:5.52682
[432]	validation_0-rmse:5.20724	validation_1-rmse:5.52219
[433]	validation_0-rmse:5.20266	validation_1-rmse:5.51847
[434]	validation_0-rmse:5.19857	validation_1-rmse:5.51545
[435]	validation_0-rmse:5.19388	validation_1-rmse:5.51095
[436]	validation_0-rmse:5.18978	validation_1-rmse:5.50817
[437]	validation_0-rmse:5.18423	validation_1-rmse:5.50363
[438]	validation_0-rmse:5.17859	validation_1-rmse:5.50059
[439]	validation_0-rmse:5.1756	validation_1-rmse:5.49761
[440]	validation_0-rmse:5.17263	validation_1-rmse:5.49309
[441]	validation_0-rmse:5.16756	validation_1-rmse:5.48965
[442]	validation_0-rmse:5.16287	validation_1-rmse:5.48574
[443]	validation_0-rmse:5.15993	validation_1-rmse:5.48402
[444]	validation_0-rmse:5.15759	validation_1-rmse:5.48277
[445]	validation_0-rmse:5.15211	validation_1-rmse:5.47875
[446]	validation_0-rmse:5.14756	validation_1-rmse:5.47483
[447]	validation_0-rmse:5.14369	validation_1-rmse:5.47242
[448]	validation_0-rmse:5.14046	validation_1-rmse:5.47013
[449]	validation_0-rmse:5.13445	validation_1-rmse:5.4672
[450]	validation_0-rmse:5.12953	validation_1-rmse:5.46469
[451]	validation_0-rmse:5.12551	validation_1-rmse:5.46061
[452]	validation_0-rmse:5.12165	validation_1-rmse:5.45828
[453]	validation_0-rmse:5.11767	validation_1-rmse:5.45581
[454]	validation_0-rmse:5.11433	validation_1-rmse:5.45425
[455]	validation_0-rmse:5.11039	validation_1-rmse:5.45112
[456]	validation_0-rmse:5.10818	validation_1-rmse:5.44984
[457]	validation_0-rmse:5.10609	validation_1-rmse:5.44767
[458]	validation_0-rmse:5.10397	validation_1-rmse:5.44653
[459]	validation_0-rmse:5.10036	validation_1-rmse:5.44471
[460]	validation_0-rmse:5.0948	validation_1-rmse:5.44064
[461]	validation_0-rmse:5.09265	validation_1-rmse:5.43956
[462]	validation_0-rmse:5.08616	validation_1-rmse:5.43532
[463]	validation_0-rmse:5.08228	validation_1-rmse:5.43294
[464]	validation_0-rmse:5.07766	validation_1-rmse:5.43001
[465]	validation_0-rmse:5.07559	validation_1-rmse:5.42818
[466]	validation_0-rmse:5.07273	validation_1-rmse:5.42649
[467]	validation_0-rmse:5.06709	validation_1-rmse:5.42382
[468]	validation_0-rmse:5.06337	validation_1-rmse:5.41998
[469]	validation_0-rmse:5.05839	validation_1-rmse:5.41672
[470]	validation_0-rmse:5.05376	validation_1-rmse:5.41384
[471]	validation_0-rmse:5.05108	validation_1-rmse:5.41204
[472]	validation_0-rmse:5.0484	validation_1-rmse:5.41008
[473]	validation_0-rmse:5.04434	validation_1-rmse:5.40923
[474]	validation_0-rmse:5.04051	validation_1-rmse:5.40671
[475]	validation_0-rmse:5.03528	validation_1-rmse:5.40371
[476]	validation_0-rmse:5.03015	validation_1-rmse:5.40064
[477]	validation_0-rmse:5.02794	validation_1-rmse:5.39877
[478]	validation_0-rmse:5.02518	validation_1-rmse:5.39769
[479]	validation_0-rmse:5.02163	validation_1-rmse:5.39489
[480]	validation_0-rmse:5.01522	validation_1-rmse:5.39159
[481]	validation_0-rmse:5.01223	validation_1-rmse:5.38999
[482]	validation_0-rmse:5.01069	validation_1-rmse:5.38904
[483]	validation_0-rmse:5.00607	validation_1-rmse:5.38612
[484]	validation_0-rmse:5.00326	validation_1-rmse:5.38426
[485]	validation_0-rmse:5.00108	validation_1-rmse:5.38077
[486]	validation_0-rmse:4.99824	validation_1-rmse:5.3802
[487]	validation_0-rmse:4.99402	validation_1-rmse:5.37704
[488]	validation_0-rmse:4.99126	validation_1-rmse:5.37469
[489]	validation_0-rmse:4.98859	validation_1-rmse:5.37434
[490]	validation_0-rmse:4.98415	validation_1-rmse:5.37105
[491]	validation_0-rmse:4.98039	validation_1-rmse:5.36905
[492]	validation_0-rmse:4.97492	validation_1-rmse:5.36661
[493]	validation_0-rmse:4.96993	validation_1-rmse:5.36446
[494]	validation_0-rmse:4.96695	validation_1-rmse:5.36137
[495]	validation_0-rmse:4.96307	validation_1-rmse:5.35894
[496]	validation_0-rmse:4.9579	validation_1-rmse:5.35579
[497]	validation_0-rmse:4.95488	validation_1-rmse:5.35458
[498]	validation_0-rmse:4.95263	validation_1-rmse:5.35331
[499]	validation_0-rmse:4.94787	validation_1-rmse:5.35092
[500]	validation_0-rmse:4.94548	validation_1-rmse:5.34739
[501]	validation_0-rmse:4.94041	validation_1-rmse:5.34441
[502]	validation_0-rmse:4.93737	validation_1-rmse:5.34198
[503]	validation_0-rmse:4.93276	validation_1-rmse:5.34157
[504]	validation_0-rmse:4.92797	validation_1-rmse:5.33924
[505]	validation_0-rmse:4.92557	validation_1-rmse:5.33805
[506]	validation_0-rmse:4.92265	validation_1-rmse:5.3365
[507]	validation_0-rmse:4.92007	validation_1-rmse:5.33441
[508]	validation_0-rmse:4.91708	validation_1-rmse:5.33182
[509]	validation_0-rmse:4.91397	validation_1-rmse:5.33136
[510]	validation_0-rmse:4.91092	validation_1-rmse:5.3297
[511]	validation_0-rmse:4.90739	validation_1-rmse:5.32696
[512]	validation_0-rmse:4.90395	validation_1-rmse:5.32415
[513]	validation_0-rmse:4.90241	validation_1-rmse:5.32226
[514]	validation_0-rmse:4.89915	validation_1-rmse:5.31989
[515]	validation_0-rmse:4.89697	validation_1-rmse:5.31963
[516]	validation_0-rmse:4.89519	validation_1-rmse:5.31904
[517]	validation_0-rmse:4.89146	validation_1-rmse:5.31626
[518]	validation_0-rmse:4.88932	validation_1-rmse:5.3155
[519]	validation_0-rmse:4.88421	validation_1-rmse:5.31269
[520]	validation_0-rmse:4.8822	validation_1-rmse:5.31018
[521]	validation_0-rmse:4.87952	validation_1-rmse:5.30917
[522]	validation_0-rmse:4.87735	validation_1-rmse:5.30741
[523]	validation_0-rmse:4.87464	validation_1-rmse:5.30574
[524]	validation_0-rmse:4.87235	validation_1-rmse:5.30366
[525]	validation_0-rmse:4.86843	validation_1-rmse:5.30161
[526]	validation_0-rmse:4.86362	validation_1-rmse:5.29879
[527]	validation_0-rmse:4.85862	validation_1-rmse:5.29597
[528]	validation_0-rmse:4.85445	validation_1-rmse:5.29319
[529]	validation_0-rmse:4.8491	validation_1-rmse:5.28996
[530]	validation_0-rmse:4.8472	validation_1-rmse:5.28921
[531]	validation_0-rmse:4.84538	validation_1-rmse:5.28806
[532]	validation_0-rmse:4.84298	validation_1-rmse:5.28743
[533]	validation_0-rmse:4.83913	validation_1-rmse:5.28489
[534]	validation_0-rmse:4.83762	validation_1-rmse:5.28408
[535]	validation_0-rmse:4.8345	validation_1-rmse:5.28277
[536]	validation_0-rmse:4.83283	validation_1-rmse:5.28215
[537]	validation_0-rmse:4.82917	validation_1-rmse:5.27955
[538]	validation_0-rmse:4.82623	validation_1-rmse:5.27663
[539]	validation_0-rmse:4.8222	validation_1-rmse:5.27258
[540]	validation_0-rmse:4.81727	validation_1-rmse:5.26847
[541]	validation_0-rmse:4.81566	validation_1-rmse:5.26714
[542]	validation_0-rmse:4.81225	validation_1-rmse:5.26648
[543]	validation_0-rmse:4.80955	validation_1-rmse:5.26472
[544]	validation_0-rmse:4.80686	validation_1-rmse:5.2636
[545]	validation_0-rmse:4.80549	validation_1-rmse:5.2626
[546]	validation_0-rmse:4.8032	validation_1-rmse:5.26046
[547]	validation_0-rmse:4.79774	validation_1-rmse:5.258
[548]	validation_0-rmse:4.79592	validation_1-rmse:5.2574
[549]	validation_0-rmse:4.79232	validation_1-rmse:5.25627
[550]	validation_0-rmse:4.78965	validation_1-rmse:5.25444
[551]	validation_0-rmse:4.78677	validation_1-rmse:5.25319
[552]	validation_0-rmse:4.78434	validation_1-rmse:5.2517
[553]	validation_0-rmse:4.78141	validation_1-rmse:5.24939
[554]	validation_0-rmse:4.77997	validation_1-rmse:5.24887
[555]	validation_0-rmse:4.7779	validation_1-rmse:5.2477
[556]	validation_0-rmse:4.77543	validation_1-rmse:5.24583
[557]	validation_0-rmse:4.7729	validation_1-rmse:5.24367
[558]	validation_0-rmse:4.76894	validation_1-rmse:5.24181
[559]	validation_0-rmse:4.76548	validation_1-rmse:5.2405
[560]	validation_0-rmse:4.7643	validation_1-rmse:5.23975
[561]	validation_0-rmse:4.76138	validation_1-rmse:5.23837
[562]	validation_0-rmse:4.75817	validation_1-rmse:5.2367
[563]	validation_0-rmse:4.75672	validation_1-rmse:5.23572
[564]	validation_0-rmse:4.75467	validation_1-rmse:5.23418
[565]	validation_0-rmse:4.75041	validation_1-rmse:5.232
[566]	validation_0-rmse:4.74763	validation_1-rmse:5.23048
[567]	validation_0-rmse:4.74321	validation_1-rmse:5.22682
[568]	validation_0-rmse:4.73928	validation_1-rmse:5.22573
[569]	validation_0-rmse:4.73611	validation_1-rmse:5.22444
[570]	validation_0-rmse:4.73286	validation_1-rmse:5.2226
[571]	validation_0-rmse:4.72974	validation_1-rmse:5.22111
[572]	validation_0-rmse:4.72745	validation_1-rmse:5.22002
[573]	validation_0-rmse:4.72544	validation_1-rmse:5.2191
[574]	validation_0-rmse:4.72149	validation_1-rmse:5.21676
[575]	validation_0-rmse:4.71767	validation_1-rmse:5.21474
[576]	validation_0-rmse:4.71308	validation_1-rmse:5.21266
[577]	validation_0-rmse:4.70917	validation_1-rmse:5.21107
[578]	validation_0-rmse:4.70657	validation_1-rmse:5.20877
[579]	validation_0-rmse:4.70488	validation_1-rmse:5.2079
[580]	validation_0-rmse:4.70183	validation_1-rmse:5.20699
[581]	validation_0-rmse:4.7002	validation_1-rmse:5.20715
[582]	validation_0-rmse:4.69568	validation_1-rmse:5.20543
[583]	validation_0-rmse:4.69289	validation_1-rmse:5.20406
[584]	validation_0-rmse:4.69015	validation_1-rmse:5.20218
[585]	validation_0-rmse:4.6867	validation_1-rmse:5.20152
[586]	validation_0-rmse:4.68306	validation_1-rmse:5.20013
[587]	validation_0-rmse:4.67887	validation_1-rmse:5.19834
[588]	validation_0-rmse:4.67563	validation_1-rmse:5.19653
[589]	validation_0-rmse:4.67399	validation_1-rmse:5.1963
[590]	validation_0-rmse:4.67146	validation_1-rmse:5.19452
[591]	validation_0-rmse:4.66913	validation_1-rmse:5.19333
[592]	validation_0-rmse:4.66692	validation_1-rmse:5.19285
[593]	validation_0-rmse:4.66321	validation_1-rmse:5.1902
[594]	validation_0-rmse:4.66043	validation_1-rmse:5.18929
[595]	validation_0-rmse:4.65704	validation_1-rmse:5.18874
[596]	validation_0-rmse:4.65412	validation_1-rmse:5.18634
[597]	validation_0-rmse:4.65215	validation_1-rmse:5.18543
[598]	validation_0-rmse:4.64856	validation_1-rmse:5.18365
[599]	validation_0-rmse:4.6459	validation_1-rmse:5.18307
[600]	validation_0-rmse:4.64394	validation_1-rmse:5.18339
[601]	validation_0-rmse:4.64204	validation_1-rmse:5.18392
[602]	validation_0-rmse:4.64007	validation_1-rmse:5.18293
[603]	validation_0-rmse:4.63854	validation_1-rmse:5.18261
[604]	validation_0-rmse:4.63647	validation_1-rmse:5.18333
[605]	validation_0-rmse:4.63277	validation_1-rmse:5.18227
[606]	validation_0-rmse:4.63148	validation_1-rmse:5.18183
[607]	validation_0-rmse:4.62984	validation_1-rmse:5.18123
[608]	validation_0-rmse:4.62744	validation_1-rmse:5.17936
[609]	validation_0-rmse:4.62587	validation_1-rmse:5.17908
[610]	validation_0-rmse:4.62352	validation_1-rmse:5.17894
[611]	validation_0-rmse:4.61979	validation_1-rmse:5.17694
[612]	validation_0-rmse:4.61714	validation_1-rmse:5.17448
[613]	validation_0-rmse:4.61337	validation_1-rmse:5.17329
[614]	validation_0-rmse:4.6101	validation_1-rmse:5.17327
[615]	validation_0-rmse:4.60751	validation_1-rmse:5.17231
[616]	validation_0-rmse:4.60496	validation_1-rmse:5.17245
[617]	validation_0-rmse:4.60354	validation_1-rmse:5.17263
[618]	validation_0-rmse:4.60019	validation_1-rmse:5.17143
[619]	validation_0-rmse:4.5974	validation_1-rmse:5.1699
[620]	validation_0-rmse:4.59522	validation_1-rmse:5.17006
[621]	validation_0-rmse:4.59225	validation_1-rmse:5.1684
[622]	validation_0-rmse:4.58974	validation_1-rmse:5.16698
[623]	validation_0-rmse:4.58692	validation_1-rmse:5.16575
[624]	validation_0-rmse:4.5851	validation_1-rmse:5.16524
[625]	validation_0-rmse:4.58197	validation_1-rmse:5.1635
[626]	validation_0-rmse:4.57863	validation_1-rmse:5.16169
[627]	validation_0-rmse:4.57742	validation_1-rmse:5.16127
[628]	validation_0-rmse:4.57528	validation_1-rmse:5.15959
[629]	validation_0-rmse:4.57205	validation_1-rmse:5.15867
[630]	validation_0-rmse:4.56914	validation_1-rmse:5.15715
[631]	validation_0-rmse:4.56706	validation_1-rmse:5.15682
[632]	validation_0-rmse:4.56551	validation_1-rmse:5.15688
[633]	validation_0-rmse:4.56301	validation_1-rmse:5.15518
[634]	validation_0-rmse:4.56088	validation_1-rmse:5.15388
[635]	validation_0-rmse:4.55982	validation_1-rmse:5.15355
[636]	validation_0-rmse:4.55778	validation_1-rmse:5.15381
[637]	validation_0-rmse:4.55512	validation_1-rmse:5.15319
[638]	validation_0-rmse:4.55256	validation_1-rmse:5.15149
[639]	validation_0-rmse:4.55018	validation_1-rmse:5.15128
[640]	validation_0-rmse:4.54726	validation_1-rmse:5.15061
[641]	validation_0-rmse:4.54469	validation_1-rmse:5.1485
[642]	validation_0-rmse:4.543	validation_1-rmse:5.14897
[643]	validation_0-rmse:4.54139	validation_1-rmse:5.14872
[644]	validation_0-rmse:4.53988	validation_1-rmse:5.14929
[645]	validation_0-rmse:4.53733	validation_1-rmse:5.14705
[646]	validation_0-rmse:4.53382	validation_1-rmse:5.1457
[647]	validation_0-rmse:4.5321	validation_1-rmse:5.1454
[648]	validation_0-rmse:4.52965	validation_1-rmse:5.14486
[649]	validation_0-rmse:4.52629	validation_1-rmse:5.14333
[650]	validation_0-rmse:4.52313	validation_1-rmse:5.14288
[651]	validation_0-rmse:4.52078	validation_1-rmse:5.1422
[652]	validation_0-rmse:4.51696	validation_1-rmse:5.14087
[653]	validation_0-rmse:4.51364	validation_1-rmse:5.13851
[654]	validation_0-rmse:4.51198	validation_1-rmse:5.13888
[655]	validation_0-rmse:4.50931	validation_1-rmse:5.13718
[656]	validation_0-rmse:4.50789	validation_1-rmse:5.13678
[657]	validation_0-rmse:4.50684	validation_1-rmse:5.13574
[658]	validation_0-rmse:4.5036	validation_1-rmse:5.13432
[659]	validation_0-rmse:4.50181	validation_1-rmse:5.13405
[660]	validation_0-rmse:4.49984	validation_1-rmse:5.13405
[661]	validation_0-rmse:4.49646	validation_1-rmse:5.13276
[662]	validation_0-rmse:4.49505	validation_1-rmse:5.13252
[663]	validation_0-rmse:4.4914	validation_1-rmse:5.13192
[664]	validation_0-rmse:4.4899	validation_1-rmse:5.1327
[665]	validation_0-rmse:4.48619	validation_1-rmse:5.13282
[666]	validation_0-rmse:4.48511	validation_1-rmse:5.13252
[667]	validation_0-rmse:4.48406	validation_1-rmse:5.13276
[668]	validation_0-rmse:4.48222	validation_1-rmse:5.13263
[669]	validation_0-rmse:4.48074	validation_1-rmse:5.13192
[670]	validation_0-rmse:4.47757	validation_1-rmse:5.13157
[671]	validation_0-rmse:4.47408	validation_1-rmse:5.13046
[672]	validation_0-rmse:4.47152	validation_1-rmse:5.12884
[673]	validation_0-rmse:4.46912	validation_1-rmse:5.12608
[674]	validation_0-rmse:4.46676	validation_1-rmse:5.12533
[675]	validation_0-rmse:4.4645	validation_1-rmse:5.12475
[676]	validation_0-rmse:4.46242	validation_1-rmse:5.12445
[677]	validation_0-rmse:4.46016	validation_1-rmse:5.12415
[678]	validation_0-rmse:4.45769	validation_1-rmse:5.12315
[679]	validation_0-rmse:4.45657	validation_1-rmse:5.12279
[680]	validation_0-rmse:4.45421	validation_1-rmse:5.12251
[681]	validation_0-rmse:4.45351	validation_1-rmse:5.12207
[682]	validation_0-rmse:4.45154	validation_1-rmse:5.12248
[683]	validation_0-rmse:4.44836	validation_1-rmse:5.12116
[684]	validation_0-rmse:4.44596	validation_1-rmse:5.12089
[685]	validation_0-rmse:4.44321	validation_1-rmse:5.11914
[686]	validation_0-rmse:4.44263	validation_1-rmse:5.11874
[687]	validation_0-rmse:4.43919	validation_1-rmse:5.11734
[688]	validation_0-rmse:4.43816	validation_1-rmse:5.117
[689]	validation_0-rmse:4.43714	validation_1-rmse:5.11683
[690]	validation_0-rmse:4.43485	validation_1-rmse:5.11538
[691]	validation_0-rmse:4.43243	validation_1-rmse:5.1148
[692]	validation_0-rmse:4.42986	validation_1-rmse:5.11357
[693]	validation_0-rmse:4.42779	validation_1-rmse:5.11307
[694]	validation_0-rmse:4.42675	validation_1-rmse:5.11236
[695]	validation_0-rmse:4.42556	validation_1-rmse:5.1127
[696]	validation_0-rmse:4.42441	validation_1-rmse:5.11221
[697]	validation_0-rmse:4.42125	validation_1-rmse:5.11087
[698]	validation_0-rmse:4.4196	validation_1-rmse:5.11115
[699]	validation_0-rmse:4.41824	validation_1-rmse:5.11107
[700]	validation_0-rmse:4.41596	validation_1-rmse:5.11076
[701]	validation_0-rmse:4.41375	validation_1-rmse:5.11009
[702]	validation_0-rmse:4.41274	validation_1-rmse:5.10992
[703]	validation_0-rmse:4.41119	validation_1-rmse:5.1095
[704]	validation_0-rmse:4.4084	validation_1-rmse:5.10937
[705]	validation_0-rmse:4.40618	validation_1-rmse:5.10891
[706]	validation_0-rmse:4.4047	validation_1-rmse:5.10751
[707]	validation_0-rmse:4.40318	validation_1-rmse:5.10708
[708]	validation_0-rmse:4.40133	validation_1-rmse:5.10624
[709]	validation_0-rmse:4.40022	validation_1-rmse:5.107
[710]	validation_0-rmse:4.39702	validation_1-rmse:5.10626
[711]	validation_0-rmse:4.39614	validation_1-rmse:5.10583
[712]	validation_0-rmse:4.39313	validation_1-rmse:5.10523
[713]	validation_0-rmse:4.39166	validation_1-rmse:5.10615
[714]	validation_0-rmse:4.39022	validation_1-rmse:5.10637
[715]	validation_0-rmse:4.3884	validation_1-rmse:5.10656
[716]	validation_0-rmse:4.38619	validation_1-rmse:5.10703
[717]	validation_0-rmse:4.38313	validation_1-rmse:5.10674
[718]	validation_0-rmse:4.37941	validation_1-rmse:5.10548
[719]	validation_0-rmse:4.3766	validation_1-rmse:5.10491
[720]	validation_0-rmse:4.37462	validation_1-rmse:5.10422
[721]	validation_0-rmse:4.37084	validation_1-rmse:5.10292
[722]	validation_0-rmse:4.36757	validation_1-rmse:5.10243
[723]	validation_0-rmse:4.36573	validation_1-rmse:5.10241
[724]	validation_0-rmse:4.36384	validation_1-rmse:5.10154
[725]	validation_0-rmse:4.36157	validation_1-rmse:5.10068
[726]	validation_0-rmse:4.35835	validation_1-rmse:5.09845
[727]	validation_0-rmse:4.35576	validation_1-rmse:5.09656
[728]	validation_0-rmse:4.35282	validation_1-rmse:5.09584
[729]	validation_0-rmse:4.34909	validation_1-rmse:5.09476
[730]	validation_0-rmse:4.34627	validation_1-rmse:5.09422
[731]	validation_0-rmse:4.34479	validation_1-rmse:5.09383
[732]	validation_0-rmse:4.34243	validation_1-rmse:5.09293
[733]	validation_0-rmse:4.34077	validation_1-rmse:5.09199
[734]	validation_0-rmse:4.33903	validation_1-rmse:5.09195
[735]	validation_0-rmse:4.3381	validation_1-rmse:5.09189
[736]	validation_0-rmse:4.33693	validation_1-rmse:5.09221
[737]	validation_0-rmse:4.33562	validation_1-rmse:5.09242
[738]	validation_0-rmse:4.33168	validation_1-rmse:5.09006
[739]	validation_0-rmse:4.33055	validation_1-rmse:5.08957
[740]	validation_0-rmse:4.32846	validation_1-rmse:5.08865
[741]	validation_0-rmse:4.32594	validation_1-rmse:5.08757
[742]	validation_0-rmse:4.32451	validation_1-rmse:5.08732
[743]	validation_0-rmse:4.32099	validation_1-rmse:5.08664
[744]	validation_0-rmse:4.31822	validation_1-rmse:5.08525
[745]	validation_0-rmse:4.31522	validation_1-rmse:5.08333
[746]	validation_0-rmse:4.31169	validation_1-rmse:5.0826
[747]	validation_0-rmse:4.31081	validation_1-rmse:5.0825
[748]	validation_0-rmse:4.30966	validation_1-rmse:5.08305
[749]	validation_0-rmse:4.30733	validation_1-rmse:5.08215
[750]	validation_0-rmse:4.3052	validation_1-rmse:5.08181
[751]	validation_0-rmse:4.30316	validation_1-rmse:5.08035
[752]	validation_0-rmse:4.3007	validation_1-rmse:5.07906
[753]	validation_0-rmse:4.29995	validation_1-rmse:5.0797
[754]	validation_0-rmse:4.29723	validation_1-rmse:5.07921
[755]	validation_0-rmse:4.2937	validation_1-rmse:5.07832
[756]	validation_0-rmse:4.29216	validation_1-rmse:5.07769
[757]	validation_0-rmse:4.28982	validation_1-rmse:5.07769
[758]	validation_0-rmse:4.28707	validation_1-rmse:5.07776
[759]	validation_0-rmse:4.28446	validation_1-rmse:5.07706
[760]	validation_0-rmse:4.28211	validation_1-rmse:5.07636
[761]	validation_0-rmse:4.28018	validation_1-rmse:5.07629
[762]	validation_0-rmse:4.27815	validation_1-rmse:5.07513
[763]	validation_0-rmse:4.27545	validation_1-rmse:5.07383
[764]	validation_0-rmse:4.27205	validation_1-rmse:5.07309
[765]	validation_0-rmse:4.27045	validation_1-rmse:5.07264
[766]	validation_0-rmse:4.26764	validation_1-rmse:5.07214
[767]	validation_0-rmse:4.26516	validation_1-rmse:5.0707
[768]	validation_0-rmse:4.26361	validation_1-rmse:5.07104
[769]	validation_0-rmse:4.26255	validation_1-rmse:5.07091
[770]	validation_0-rmse:4.26032	validation_1-rmse:5.0707
[771]	validation_0-rmse:4.25896	validation_1-rmse:5.06997
[772]	validation_0-rmse:4.25723	validation_1-rmse:5.06872
[773]	validation_0-rmse:4.25441	validation_1-rmse:5.06733
[774]	validation_0-rmse:4.25311	validation_1-rmse:5.06681
[775]	validation_0-rmse:4.25124	validation_1-rmse:5.06719
[776]	validation_0-rmse:4.2491	validation_1-rmse:5.06668
[777]	validation_0-rmse:4.2459	validation_1-rmse:5.06439
[778]	validation_0-rmse:4.24398	validation_1-rmse:5.06381
[779]	validation_0-rmse:4.24202	validation_1-rmse:5.06334
[780]	validation_0-rmse:4.2392	validation_1-rmse:5.06319
[781]	validation_0-rmse:4.23793	validation_1-rmse:5.0636
[782]	validation_0-rmse:4.23599	validation_1-rmse:5.06307
[783]	validation_0-rmse:4.23424	validation_1-rmse:5.06329
[784]	validation_0-rmse:4.23212	validation_1-rmse:5.06271
[785]	validation_0-rmse:4.22861	validation_1-rmse:5.06192
[786]	validation_0-rmse:4.22782	validation_1-rmse:5.06177
[787]	validation_0-rmse:4.22623	validation_1-rmse:5.06155
[788]	validation_0-rmse:4.22389	validation_1-rmse:5.06165
[789]	validation_0-rmse:4.2218	validation_1-rmse:5.06008
[790]	validation_0-rmse:4.2203	validation_1-rmse:5.05865
[791]	validation_0-rmse:4.21957	validation_1-rmse:5.05791
[792]	validation_0-rmse:4.21851	validation_1-rmse:5.05778
[793]	validation_0-rmse:4.21767	validation_1-rmse:5.05753
[794]	validation_0-rmse:4.21562	validation_1-rmse:5.05771
[795]	validation_0-rmse:4.21301	validation_1-rmse:5.05564
[796]	validation_0-rmse:4.21027	validation_1-rmse:5.05415
[797]	validation_0-rmse:4.20761	validation_1-rmse:5.0534
[798]	validation_0-rmse:4.20646	validation_1-rmse:5.05387
[799]	validation_0-rmse:4.20448	validation_1-rmse:5.0541
[800]	validation_0-rmse:4.20281	validation_1-rmse:5.05398
[801]	validation_0-rmse:4.20034	validation_1-rmse:5.05383
[802]	validation_0-rmse:4.19776	validation_1-rmse:5.05436
[803]	validation_0-rmse:4.19721	validation_1-rmse:5.05418
[804]	validation_0-rmse:4.19537	validation_1-rmse:5.0547
[805]	validation_0-rmse:4.19461	validation_1-rmse:5.05431
[806]	validation_0-rmse:4.19216	validation_1-rmse:5.05347
[807]	validation_0-rmse:4.1913	validation_1-rmse:5.05333
[808]	validation_0-rmse:4.18941	validation_1-rmse:5.0541
[809]	validation_0-rmse:4.18632	validation_1-rmse:5.05343
[810]	validation_0-rmse:4.18509	validation_1-rmse:5.05369
[811]	validation_0-rmse:4.18371	validation_1-rmse:5.05393
[812]	validation_0-rmse:4.1829	validation_1-rmse:5.05371
[813]	validation_0-rmse:4.18113	validation_1-rmse:5.05328
[814]	validation_0-rmse:4.17975	validation_1-rmse:5.05386
[815]	validation_0-rmse:4.179	validation_1-rmse:5.05402
[816]	validation_0-rmse:4.17752	validation_1-rmse:5.05494
[817]	validation_0-rmse:4.17609	validation_1-rmse:5.05514
[818]	validation_0-rmse:4.17428	validation_1-rmse:5.05563
[819]	validation_0-rmse:4.17222	validation_1-rmse:5.05536
[820]	validation_0-rmse:4.17116	validation_1-rmse:5.05595
[821]	validation_0-rmse:4.17021	validation_1-rmse:5.05559
[822]	validation_0-rmse:4.16714	validation_1-rmse:5.05423
[823]	validation_0-rmse:4.16436	validation_1-rmse:5.05406
[824]	validation_0-rmse:4.16265	validation_1-rmse:5.0551
[825]	validation_0-rmse:4.16101	validation_1-rmse:5.05519
[826]	validation_0-rmse:4.15912	validation_1-rmse:5.05477
[827]	validation_0-rmse:4.15677	validation_1-rmse:5.05439
[828]	validation_0-rmse:4.15485	validation_1-rmse:5.05475
[829]	validation_0-rmse:4.15308	validation_1-rmse:5.05541
[830]	validation_0-rmse:4.1507	validation_1-rmse:5.05454
[831]	validation_0-rmse:4.14874	validation_1-rmse:5.05434
[832]	validation_0-rmse:4.14683	validation_1-rmse:5.05423
[833]	validation_0-rmse:4.14456	validation_1-rmse:5.05316
[834]	validation_0-rmse:4.14191	validation_1-rmse:5.05282
[835]	validation_0-rmse:4.13953	validation_1-rmse:5.05279
[836]	validation_0-rmse:4.13812	validation_1-rmse:5.05343
[837]	validation_0-rmse:4.13567	validation_1-rmse:5.05338
[838]	validation_0-rmse:4.1341	validation_1-rmse:5.05364
[839]	validation_0-rmse:4.13272	validation_1-rmse:5.053
[840]	validation_0-rmse:4.13134	validation_1-rmse:5.05248
[841]	validation_0-rmse:4.12952	validation_1-rmse:5.05283
[842]	validation_0-rmse:4.12809	validation_1-rmse:5.05277
[843]	validation_0-rmse:4.12566	validation_1-rmse:5.05165
[844]	validation_0-rmse:4.12205	validation_1-rmse:5.05102
[845]	validation_0-rmse:4.11972	validation_1-rmse:5.05095
[846]	validation_0-rmse:4.11769	validation_1-rmse:5.05131
[847]	validation_0-rmse:4.11521	validation_1-rmse:5.05161
[848]	validation_0-rmse:4.11362	validation_1-rmse:5.0506
[849]	validation_0-rmse:4.11217	validation_1-rmse:5.04966
[850]	validation_0-rmse:4.1116	validation_1-rmse:5.04941
[851]	validation_0-rmse:4.10874	validation_1-rmse:5.05034
[852]	validation_0-rmse:4.10758	validation_1-rmse:5.05063
[853]	validation_0-rmse:4.10568	validation_1-rmse:5.04932
[854]	validation_0-rmse:4.10394	validation_1-rmse:5.0493
[855]	validation_0-rmse:4.1029	validation_1-rmse:5.04841
[856]	validation_0-rmse:4.10174	validation_1-rmse:5.0493
[857]	validation_0-rmse:4.09997	validation_1-rmse:5.0478
[858]	validation_0-rmse:4.09904	validation_1-rmse:5.04829
[859]	validation_0-rmse:4.09771	validation_1-rmse:5.04873
[860]	validation_0-rmse:4.09609	validation_1-rmse:5.04886
[861]	validation_0-rmse:4.09525	validation_1-rmse:5.04824
[862]	validation_0-rmse:4.09438	validation_1-rmse:5.04788
[863]	validation_0-rmse:4.09377	validation_1-rmse:5.04774
[864]	validation_0-rmse:4.09254	validation_1-rmse:5.04861
[865]	validation_0-rmse:4.09121	validation_1-rmse:5.04909
[866]	validation_0-rmse:4.09006	validation_1-rmse:5.04943
[867]	validation_0-rmse:4.08857	validation_1-rmse:5.04997
[868]	validation_0-rmse:4.08693	validation_1-rmse:5.04991
[869]	validation_0-rmse:4.08564	validation_1-rmse:5.04942
[870]	validation_0-rmse:4.0836	validation_1-rmse:5.04896
[871]	validation_0-rmse:4.08256	validation_1-rmse:5.04942
[872]	validation_0-rmse:4.08151	validation_1-rmse:5.04792
[873]	validation_0-rmse:4.07912	validation_1-rmse:5.04815
[874]	validation_0-rmse:4.07801	validation_1-rmse:5.04826
[875]	validation_0-rmse:4.07656	validation_1-rmse:5.04754
[876]	validation_0-rmse:4.07554	validation_1-rmse:5.04634
[877]	validation_0-rmse:4.07441	validation_1-rmse:5.04647
[878]	validation_0-rmse:4.07254	validation_1-rmse:5.04588
[879]	validation_0-rmse:4.06926	validation_1-rmse:5.04519
[880]	validation_0-rmse:4.06693	validation_1-rmse:5.04519
[881]	validation_0-rmse:4.06602	validation_1-rmse:5.04503
[882]	validation_0-rmse:4.06507	validation_1-rmse:5.04519
[883]	validation_0-rmse:4.06264	validation_1-rmse:5.04502
[884]	validation_0-rmse:4.06018	validation_1-rmse:5.04422
[885]	validation_0-rmse:4.0586	validation_1-rmse:5.04351
[886]	validation_0-rmse:4.05586	validation_1-rmse:5.04413
[887]	validation_0-rmse:4.05461	validation_1-rmse:5.044
[888]	validation_0-rmse:4.05262	validation_1-rmse:5.04374
[889]	validation_0-rmse:4.05078	validation_1-rmse:5.04336
[890]	validation_0-rmse:4.04873	validation_1-rmse:5.04294
[891]	validation_0-rmse:4.04772	validation_1-rmse:5.04284
[892]	validation_0-rmse:4.04622	validation_1-rmse:5.0427
[893]	validation_0-rmse:4.04504	validation_1-rmse:5.04368
[894]	validation_0-rmse:4.04401	validation_1-rmse:5.04271
[895]	validation_0-rmse:4.04147	validation_1-rmse:5.043
[896]	validation_0-rmse:4.0396	validation_1-rmse:5.04237
[897]	validation_0-rmse:4.03751	validation_1-rmse:5.04222
[898]	validation_0-rmse:4.03664	validation_1-rmse:5.04287
[899]	validation_0-rmse:4.03484	validation_1-rmse:5.04227
[900]	validation_0-rmse:4.03356	validation_1-rmse:5.04239
[901]	validation_0-rmse:4.03254	validation_1-rmse:5.04239
[902]	validation_0-rmse:4.03096	validation_1-rmse:5.04215
[903]	validation_0-rmse:4.02862	validation_1-rmse:5.04077
[904]	validation_0-rmse:4.02717	validation_1-rmse:5.03947
[905]	validation_0-rmse:4.02592	validation_1-rmse:5.03944
[906]	validation_0-rmse:4.02494	validation_1-rmse:5.03995
[907]	validation_0-rmse:4.0239	validation_1-rmse:5.04031
[908]	validation_0-rmse:4.02144	validation_1-rmse:5.03956
[909]	validation_0-rmse:4.01947	validation_1-rmse:5.04002
[910]	validation_0-rmse:4.01786	validation_1-rmse:5.03958
[911]	validation_0-rmse:4.01641	validation_1-rmse:5.0405
[912]	validation_0-rmse:4.01422	validation_1-rmse:5.04004
[913]	validation_0-rmse:4.01156	validation_1-rmse:5.03943
[914]	validation_0-rmse:4.00934	validation_1-rmse:5.03893
[915]	validation_0-rmse:4.00723	validation_1-rmse:5.03908
[916]	validation_0-rmse:4.00593	validation_1-rmse:5.03873
[917]	validation_0-rmse:4.00383	validation_1-rmse:5.03769
[918]	validation_0-rmse:4.00315	validation_1-rmse:5.03854
[919]	validation_0-rmse:4.00139	validation_1-rmse:5.03721
[920]	validation_0-rmse:3.9986	validation_1-rmse:5.03694
[921]	validation_0-rmse:3.996	validation_1-rmse:5.03554
[922]	validation_0-rmse:3.99443	validation_1-rmse:5.03506
[923]	validation_0-rmse:3.99208	validation_1-rmse:5.03366
[924]	validation_0-rmse:3.98985	validation_1-rmse:5.03369
[925]	validation_0-rmse:3.98792	validation_1-rmse:5.0336
[926]	validation_0-rmse:3.9861	validation_1-rmse:5.03235
[927]	validation_0-rmse:3.98383	validation_1-rmse:5.03247
[928]	validation_0-rmse:3.98223	validation_1-rmse:5.03151
[929]	validation_0-rmse:3.98042	validation_1-rmse:5.03149
[930]	validation_0-rmse:3.97798	validation_1-rmse:5.03001
[931]	validation_0-rmse:3.97665	validation_1-rmse:5.0293
[932]	validation_0-rmse:3.97465	validation_1-rmse:5.02907
[933]	validation_0-rmse:3.97308	validation_1-rmse:5.02922
[934]	validation_0-rmse:3.97141	validation_1-rmse:5.02878
[935]	validation_0-rmse:3.96977	validation_1-rmse:5.02865
[936]	validation_0-rmse:3.96835	validation_1-rmse:5.02838
[937]	validation_0-rmse:3.96565	validation_1-rmse:5.02814
[938]	validation_0-rmse:3.96492	validation_1-rmse:5.02865
[939]	validation_0-rmse:3.96336	validation_1-rmse:5.02702
[940]	validation_0-rmse:3.96153	validation_1-rmse:5.02726
[941]	validation_0-rmse:3.96014	validation_1-rmse:5.02697
[942]	validation_0-rmse:3.95933	validation_1-rmse:5.02667
[943]	validation_0-rmse:3.95895	validation_1-rmse:5.02649
[944]	validation_0-rmse:3.95627	validation_1-rmse:5.02583
[945]	validation_0-rmse:3.95495	validation_1-rmse:5.02566
[946]	validation_0-rmse:3.95363	validation_1-rmse:5.02515
[947]	validation_0-rmse:3.95128	validation_1-rmse:5.02473
[948]	validation_0-rmse:3.94946	validation_1-rmse:5.02412
[949]	validation_0-rmse:3.94719	validation_1-rmse:5.02344
[950]	validation_0-rmse:3.94498	validation_1-rmse:5.02294
[951]	validation_0-rmse:3.94378	validation_1-rmse:5.02248
[952]	validation_0-rmse:3.94078	validation_1-rmse:5.02222
[953]	validation_0-rmse:3.93898	validation_1-rmse:5.02198
[954]	validation_0-rmse:3.9377	validation_1-rmse:5.02163
[955]	validation_0-rmse:3.93626	validation_1-rmse:5.02159
[956]	validation_0-rmse:3.93508	validation_1-rmse:5.02093
[957]	validation_0-rmse:3.93353	validation_1-rmse:5.02086
[958]	validation_0-rmse:3.93157	validation_1-rmse:5.02066
[959]	validation_0-rmse:3.92934	validation_1-rmse:5.02078
[960]	validation_0-rmse:3.92816	validation_1-rmse:5.0201
[961]	validation_0-rmse:3.92584	validation_1-rmse:5.01778
[962]	validation_0-rmse:3.92458	validation_1-rmse:5.01754
[963]	validation_0-rmse:3.92275	validation_1-rmse:5.01774
[964]	validation_0-rmse:3.92139	validation_1-rmse:5.01759
[965]	validation_0-rmse:3.91958	validation_1-rmse:5.01723
[966]	validation_0-rmse:3.91805	validation_1-rmse:5.01645
[967]	validation_0-rmse:3.91625	validation_1-rmse:5.01588
[968]	validation_0-rmse:3.91507	validation_1-rmse:5.0161
[969]	validation_0-rmse:3.91397	validation_1-rmse:5.01565
[970]	validation_0-rmse:3.91302	validation_1-rmse:5.01537
[971]	validation_0-rmse:3.91176	validation_1-rmse:5.015
[972]	validation_0-rmse:3.90905	validation_1-rmse:5.01378
[973]	validation_0-rmse:3.90738	validation_1-rmse:5.01329
[974]	validation_0-rmse:3.9059	validation_1-rmse:5.01165
[975]	validation_0-rmse:3.90287	validation_1-rmse:5.01163
[976]	validation_0-rmse:3.90165	validation_1-rmse:5.01087
[977]	validation_0-rmse:3.90029	validation_1-rmse:5.00979
[978]	validation_0-rmse:3.89855	validation_1-rmse:5.00929
[979]	validation_0-rmse:3.89734	validation_1-rmse:5.00981
[980]	validation_0-rmse:3.89636	validation_1-rmse:5.00927
[981]	validation_0-rmse:3.89482	validation_1-rmse:5.00834
[982]	validation_0-rmse:3.89281	validation_1-rmse:5.0085
[983]	validation_0-rmse:3.89144	validation_1-rmse:5.00933
[984]	validation_0-rmse:3.89047	validation_1-rmse:5.00999
[985]	validation_0-rmse:3.88848	validation_1-rmse:5.01026
[986]	validation_0-rmse:3.88716	validation_1-rmse:5.01027
[987]	validation_0-rmse:3.88513	validation_1-rmse:5.0104
[988]	validation_0-rmse:3.8835	validation_1-rmse:5.00957
[989]	validation_0-rmse:3.88168	validation_1-rmse:5.0084
[990]	validation_0-rmse:3.88107	validation_1-rmse:5.00823
[991]	validation_0-rmse:3.87977	validation_1-rmse:5.00709
[992]	validation_0-rmse:3.87768	validation_1-rmse:5.00583
[993]	validation_0-rmse:3.87623	validation_1-rmse:5.00491
[994]	validation_0-rmse:3.87399	validation_1-rmse:5.00469
[995]	validation_0-rmse:3.87284	validation_1-rmse:5.00436
[996]	validation_0-rmse:3.87074	validation_1-rmse:5.00473
[997]	validation_0-rmse:3.86892	validation_1-rmse:5.00397
[998]	validation_0-rmse:3.86695	validation_1-rmse:5.00212
[999]	validation_0-rmse:3.86631	validation_1-rmse:5.0024
Wall time: 2.9 s

RandomizedSearchCV to tune XGB parameters

In [75]:
learning_rate = [0.05, 0.10, 0.15, 0.20, 0.25, 0.30 ]
# Number of trees in GradientBoost
n_estimators = [int(x) for x in np.linspace(start = 100 , stop = 1000, num = 4)]   # returns evenly spaced 10 numbers

# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(3, 10, num = 6)]  # returns evenly spaced numbers can be changed to any

#the fraction of observations to be used in individual tree
subsample = [0.8,0.9,1]

colsample_bytree = [0.8,0.9,1]

gamma = [0.0, 0.1, 0.2 , 0.3, 0.4, 1, 5, 10]

objective=['reg:squarederror']
#
#reg_alpha = [0.3,0.4,0.5]


# Create the random grid
rs_param_grid_xgb = {'n_estimators': n_estimators,
               'max_depth': max_depth,
               'subsample': subsample,
               'colsample_bytree': colsample_bytree,
               'gamma': gamma,
               'learning_rate':learning_rate,
               'objective':objective
                }

pprint.pprint(rs_param_grid_xgb)
{'colsample_bytree': [0.8, 0.9, 1],
 'gamma': [0.0, 0.1, 0.2, 0.3, 0.4, 1, 5, 10],
 'learning_rate': [0.05, 0.1, 0.15, 0.2, 0.25, 0.3],
 'max_depth': [3, 4, 5, 7, 8, 10],
 'n_estimators': [100, 400, 700, 1000],
 'objective': ['reg:squarederror'],
 'subsample': [0.8, 0.9, 1]}
In [76]:
rf_random_xgb = RandomizedSearchCV(estimator=xgb_model, param_distributions=rs_param_grid_xgb,
                              n_iter = 10, scoring='neg_mean_squared_error', 
                              cv = 5, verbose=2, random_state=42, n_jobs=-1,
                              return_train_score=True)

# Fit the random search model
rf_random_xgb.fit(X_train, y_train)
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   15.2s
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:   25.6s finished
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.
  DeprecationWarning)
Out[76]:
RandomizedSearchCV(cv=5, error_score='raise-deprecating',
                   estimator=XGBRegressor(base_score=0.5, booster='gbtree',
                                          colsample_bylevel=1,
                                          colsample_bynode=1,
                                          colsample_bytree=1, gamma=1,
                                          importance_type='gain',
                                          learning_rate=0.01, max_delta_step=0,
                                          max_depth=3, min_child_weight=1,
                                          missing=None, n_estimators=1000,
                                          n_jobs=1, nthread=None,
                                          objective='reg:squarederror',
                                          r...
                   param_distributions={'colsample_bytree': [0.8, 0.9, 1],
                                        'gamma': [0.0, 0.1, 0.2, 0.3, 0.4, 1, 5,
                                                  10],
                                        'learning_rate': [0.05, 0.1, 0.15, 0.2,
                                                          0.25, 0.3],
                                        'max_depth': [3, 4, 5, 7, 8, 10],
                                        'n_estimators': [100, 400, 700, 1000],
                                        'objective': ['reg:squarederror'],
                                        'subsample': [0.8, 0.9, 1]},
                   pre_dispatch='2*n_jobs', random_state=42, refit=True,
                   return_train_score=True, scoring='neg_mean_squared_error',
                   verbose=2)
In [77]:
rf_random_xgb.best_params_
Out[77]:
{'subsample': 0.9,
 'objective': 'reg:squarederror',
 'n_estimators': 700,
 'max_depth': 3,
 'learning_rate': 0.05,
 'gamma': 0.4,
 'colsample_bytree': 0.9}
In [78]:
best_random_xgb = rf_random_xgb.best_estimator_

best_random_xgb.score(X_test , y_test)
Out[78]:
0.9175261760068948

GridSearch using XGB

In [79]:
param_grid_xgb = {
     'colsample_bytree': [0.8, 0.9, 1],
     'subsample': [0.8,0.9,1],
     'n_estimators': [600,800,1000],
     'max_depth': [3, 5, 7, 10],
     'gamma': [0, 1],
     'learning_rate': [0.01, 0.05, 0.1],
     #'reg_alpha': [0.3, 0.4, 0.5],
     'objective': ['reg:squarederror']
}    
In [80]:
grid_search_xgb = GridSearchCV(estimator = xgb_model, param_grid = param_grid_xgb, 
                          cv = 5, n_jobs = -1, verbose = 2, return_train_score=True,scoring='neg_mean_squared_error')
In [81]:
# Fit the grid search to the data
grid_search_xgb.fit(X_train, y_train);
Fitting 5 folds for each of 648 candidates, totalling 3240 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    8.7s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 357 tasks      | elapsed:  2.6min
[Parallel(n_jobs=-1)]: Done 640 tasks      | elapsed:  4.4min
[Parallel(n_jobs=-1)]: Done 1005 tasks      | elapsed:  6.7min
[Parallel(n_jobs=-1)]: Done 1450 tasks      | elapsed: 10.1min
[Parallel(n_jobs=-1)]: Done 1977 tasks      | elapsed: 13.8min
[Parallel(n_jobs=-1)]: Done 2584 tasks      | elapsed: 17.9min
[Parallel(n_jobs=-1)]: Done 3240 out of 3240 | elapsed: 22.9min finished
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.
  DeprecationWarning)
In [82]:
grid_search_xgb.best_params_
Out[82]:
{'colsample_bytree': 0.8,
 'gamma': 0,
 'learning_rate': 0.1,
 'max_depth': 3,
 'n_estimators': 800,
 'objective': 'reg:squarederror',
 'subsample': 0.9}
In [83]:
best_grid_xgb = grid_search_xgb.best_estimator_
best_score_xgb = best_grid_xgb.score(X_test, y_test)
best_score_xgb
Out[83]:
0.9227246409167946

GradientBoost and XGBoost are giving the same performance, i.e around 92% of accuracy score. Lets use Bootstraping technique to give the confidence interval of the model

In [85]:
from sklearn.utils import resample
import pickle
filename = 'concrete_model.sav'

load_lr_model=pickle.load(open(filename,'rb'))


df_train = X_train.join(y_train)
df_test = X_test.join(y_test)
df_row_merged = pd.concat([df_train, df_test], ignore_index=True)
values = df_row_merged.values
#values = df_test.values

# configure bootstrap
n_iterations = 1000
n_size = int(len(X_train) * 1)


# run bootstrap
stats = list()
for i in range(n_iterations):
    # prepare train and test sets
    train = resample(values, n_samples=n_size)
    test = np.array([x for x in values if x.tolist() not in train.tolist()])
    # fit model
    model = load_lr_model
    model.fit(train[:,:-1], train[:,-1])
    y_test_array = test[:,-1]
    # evaluate model
    predictions = model.predict(test[:,:-1])
    score = model.score(test[:, :-1] , y_test_array)
    stats.append(score)

# plot scores
plt.hist(stats)
plt.show()

# confidence intervals
alpha = 0.95
p = ((1.0-alpha)/2.0) * 100
lower = max(0.0, np.percentile(stats, p))
p = (alpha+((1.0-alpha)/2.0)) * 100
upper = min(1.0, np.percentile(stats, p))
print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))
95.0 confidence interval 88.2% and 92.6%

Conclusion

  • We were given a dataset to predict the concrete compressive strength, given the independent features - cement, water, fly ash, slag, superplastic , age.
  • We found that this is a regression problem and choose RMSE as the metric to evaluate the performance of the model
  • Then we performed univariate, bivariate analysis to understand the data and found that not all the variables were highly correlated with the target.
  • There were few outliers within slag, age, water component, we imputed the ouliers with 1% and 95% values
  • Then, created composite features -

    • water_cement_ratio
    • coarse_fine_agg_ratio
      • water_binder_ratio
  • We observered a lot of correlation with the composite features and the base features using which they are derived. So. lets drop cement, water , coarseagg , fineagg , ash, slag columns as we have derived new features out of these

  • We also explored the data for any clusters as there appeared to be multiple gaussians in the independent features. We split the data into 3 clusters using KMean clustering. However, no distinct clusters seemed to be visible with the avaiable features.

  • Then we trained the model using various algorithms and evalutaed the performance of the model

    • Linear models were not giving good performance, even with polynomial features of degree 2 for the compposite features.So, we did not consider Linear models.
      • Decision Tree, RandomForest and Adaboost seemed to be highly overfitting the data.
      • We considered GradientBoost and XGBoost for further tuning
  • We tuned the hyperparameters for GradientBoost and XGBoost using RandomizedSearch and GridSearch, found that both were giving similar performance.

At 95% confidence interval our model performs between - 88.2 % and 92.6%

In [ ]: